


Psychometrika 





CONTENTS 


EFFICIENCY OF MULTIPLE-CHOICE TESTS AS A FUNC- 
TION OF SPREAD OF ITEM DIFFICULTIES 


LreE J. CRONBACH AND WILLARD G. WARRINGTON 


FINITE MARKOV PROCESSES IN PSYCHOLOGY 
GerorGE A. MILLER 
AN INTERNAL CONSISTENCY CHECK FOR THE METHOD 
OF SUCCESSIVE INTERVALS AND THE METHOD 
OF GRADED DICHOTOMIES 
ALLEN L. Epwarps AND L. L. THurRSTONE 
AN INVESTIGATION OF THE RELATION OF THE RELIA- 
BILITY OF MULTIPLE-CHOICE TESTS TO THE DIS- 
TRIBUTION OF ITEM DIFFICULTIES 181 
Freperic M. Lorp 
ON THE DETERMINATION OF REDUNDANCIES IN SOCIO- 
METRIC CHAINS 
Tan C. Ross AND FRANK HARARY 
MULTIPLE GROUP METHODS FOR COMMON-FACTOR 
ANALYSIS: THEIR BASIS, COMPUTATION, AND 
INTERPRETATION 
Louis GuTTMAN 
A TECHNIQUE FOR FACILITATING THE ROTATION OF 
FACTOR AXES, BASED ON AN EQUIVALENCE BE- 
TWEEN PERSONS AND TESTS 
JOSEPH SANDLER 
IBM COMPUTATION OF SUMS OF PRODUCTS FOR POSITIVE 
AND NEGATIVE NUMBERS 
Paut J. BuRKE 
PSYCHOMETRIC MONOGRAPHS ANNOUNCEMENT 
THE INTER-AMERICAN SOCIETY OF PSYCHOLOGY .... 236 


PREPARATION OF PROBLEM AND SOURCE MATERIALS 
FOR THE MATHEMATICAL TRAINING OF SCIENTISTS 237 








VOLUME SEVENTEEN JUNE 1952 NUMBER TWO 








STATEMENT FROM THE MANAGING EDITOR 


The editors of Psychometrika sincerely regret the delays in the publication 
dates of the last several issues. The Dentan Printing Company, which had 
printed the journal since its inception in 1936, has found it increasingly diffi- 
cult to find sufficient personnel trained in the meticulous type-setting required 
for the journal. Hence we are sorry to announce that this company was forced 
to discontinue publishing the journal with the March 1952 issue. Beginning 
with the June 1952 issue, Psychometrika will be published by the William 
Byrd Press, 1407 Sherwood Avenue, Richmond 5, Virginia. It is hoped that 
the journal will shortly be back on schedule, which calls for mailing by the 15th 
of the month of issue. 


Attention is directed to an error in the Table of Contents for the March 
1952 issue. The author of the article, ‘“A Factorial Study of Temperament,”’ 
is Melany E. Baehr, not John W. French as the Table of Contents states. 














PSYCHOMETRIKA—VOL, 17, NO. 2 
JUNE, 1952 


EFFICIENCY OF MULTIPLE-CHOICE TESTS AS A FUNCTION OF 
SPREAD OF ITEM DIFFICULTIES* 


LEE J. CRONBACH AND WILLARD G. WARRINGTON 
UNIVERSITY OF ILLINOIS 


The validity of a univocal multiple-choice test is determined for varying 
distributions of item difficulty and varying degrees of item precision. Validity 
is a function of o4? + o,? , where og measures item unreliability and o, meas- 
ures the spread of item difficulties. When this variance is very small, validity 
is high for one optimum cutting score, but the test gives relatively little valid 
information for other cutting scores. As this variance increases, eta increases 
up to a certain point, and then begins to decrease. Screening validity at the 
optimum cutting score declines as this variance increases, but the test 
becomes much more flexible, maintaining the same validity for a wide range 
of cutting scores. For items of the type ordinarily used in psychological 
tests, the test with uniform item difficulty gives greater over-all validity, and 
superior validity for most cutting scores, compared to a test with a range of 
item difficulties. When a multiple-choice test is intended to reject the poorest 
F per cent of the men tested, items should on the average be located at or 
above the threshold for men whose true ability is at the Fth percentile. 


Psychometric literature contains many articles which imply, directly or 
indirectly, that the efficiency of a test may be increased by using items 
more homogeneous in difficulty. Particularly when a test is to be used for 
dividing a group of persons being processed into accepted and rejected 
candidates, a test is desired which will discriminate the poorest acceptable 
men from the best of the rejected group, and no premium is placed on ability 
of the test to discriminate within the subgroups. Hence it has been sug- 
gested that for tests designed to identify the best x per cent of applicants, 
items should be placed close to the level where just x per cent can pass each 
item. 

At the time this study was undertaken, the most definitive contribution 
on the subject had been made by Gulliksen (3). His rationale leads to the 
conclusion that test reliability and variance are maximized when items are 
at the same difficulty level, and he noted that this recommendation conflicts 
with the conventional practice of spreading items widely in difficulty. His 
solution is specifically limited to the free-response case, where students have 
no probability of passing items by chance. Gulliksen evaluates test efficiency 
in terms of a reliability coefficient of the product-moment type. 

While working at the Naval Electronics Laboratory, the senior author 

*This research was performed under contract Nop 536 with the Bureau of Naval 


Personnel, and received additional support from the Bureau of Research and Service, 
College of Education, University of Illinois. 


127 








128 PSYCHOMETRIKA 


applied Gulliksen’s suggestion, along with several other design changes, to 
the classification test used to measure pitch discrimination in potential 
sonarmen. The results of preliminary experiments confirmed the expectation 
that screening efficiency would be improved by using nearly uniform item 
difficulty throughout the test. At this point, the late E. G. Brundage sug- 
gested that more definitive studies were needed before adopting the recom- 
mendation to use peaked (uniform difficulty) tests. In particular, while a 
test designed to pass just 45 per cent of recruits might be ideal for screening 
at that level, if a change in the manpower supply made it necessary to lower 
the standard, taking from the top 60 per cent, or permitted raising the cutting 
point to take only 30 per cent, the test peaked at the 45-per cent point might 
be inferior to a test of conventional design. As work progressed on this 
problem, it became equally important to examine the effect of multiple- 
choice “do-guess” directions upon the recommendation, which had previously 
been applied only to the free-response case. Moreover, careful consideration 
was given to selecting the best function for evaluating test efficiency. 

This study is restricted to three-choice items on which the following 
limitations are imposed: 


1. All items in any test measure the same underlying ability with equal 
saturation. The tetrachoric correlation between items after correction for 
chance success (2) is uniform for all item-pairs. 

2. The underlying ability is normally distributed in the population 
tested. ; 

3. For persons whose true ability is far below the level required to pass 
any item, the probability of selecting the correct alternative for the item is 
one-third. 

4. The person’s performance on any item is experimentally independent 
of his performance on any other item. 

5. The probability that a person will succeed on an item is a function 
of his ability, and this function is described by the integral of the normal 
curve, an ogive asymptotic to y = 1.00 and y = 1/3. 

6. The test score is the sum of the number of items passed. 


Our study investigates how the screening efficiency of a test varies with 
changes in the spread of item difficulty, for various degrees of item reliability. 
Minor analyses indicate the effect of change in test length and of level of 
item difficulty on the efficiency. Attention is also directed to the question, 
what level of item difficulty offers maximum screening validity for multiple- 
choice tests? 

While this study was in progress, the fundamental investigation by 
F. M. Lord became available (4). This thesis, influenced in part by the sug- 
gestions of Gulliksen and Tucker, attempts a rational analysis of the problem 
of item difficulty and test validity. Save for restricting himself to the free- 
response case, his assumptions are identical to ours and his conclusions are 














LEE J. CRONBACH AND WILLARD G. WARRINGTON 129 


in general accord with ours. Pending the publication of his attempts at 
rational treatment of the multiple-choice case, we have only the present 
empirical findings to guide test construction of this type, since the generali- 
zations from the free-response case require alteration when guessing is 
possible. Because of our concern with the flexibility of a test as cutting scores 
are changed, we have used a different function than Lord’s to evaluate 
validity, and this introduces minor differences into the results. Lord’s con- 
clusions which relate most closely to ours are as follows: 


4. Maximum discrimination at a given ability level, as defined by 
the discrimination index developed here, is provided by a test 
composed of items all of equal difficulty such that examinees 
at the given ability level will have a fifty per cent chance of 
answering each item correctly. 

5. There are strong indications, provided the item intercorrelations 
are not extraordinarily high, that a test composed entirely of 
items of fifty per cent difficulty will be more discriminating for 
practically all examinees in the group tested.than will any test 
characterized by a spread of item difficulties. If the examinees 
have a normal distribution of ability, for example, the former 
test will be the most discriminating for all examinees except 
those who are more than, say, two-and-a-half standard deviations 
from the mean. 

6. The shape of the frequency distribution of test scores or of true 
scores does not necessarily reflect the shape of the frequency 
distribution of ability. Sufficiently high tetrachoric item inter- 
correlations (.50 or higher) will produce rectangular or U-shaped 
distributions of test scores and of true scores even for groups 
having a normal distribution of ability. (The construction of 
tests that will produce rectangular or U-shaped score distribu- 
tions has been urged more than once in the recent literature; this 
goal can be approached, but its actual achievement, when the 
examinees have a normal distribution of ability, requires higher 
item intercorrelations than are at present usually obtained with 
most types of test items.) 


“The foregoing conclusions have here been derived only for the case 
where the test items cannot be answered correctly by guessing”’ (4, p. 75). 

Our study may also be compared to the pioneer rational treatment of 
the problem by Richardson (5). Our theoretical model adopts the ideas 
Richardson used, save that we are concerned with three-choice items. Our 
question differs from his in that he was concerned with the effect of changing 
the level of item difficulty whereas we have been concerned with changing 
the range of item difficulty. Our approach permits consideration of a range 
of item reliabilities. These extensions, while allowing our work to confirm 








130 PSYCHOMETRIKA 


the majority of his findings, introduces some modifications into the implica- 
tions of his work. In particular, it will be seen that our findings stress the 
flexibility of peaked tests, under ordinary conditions, even when the cutting 
level shifts quite a bit from the difficulty level of the items. 


Procedure 


The method we employ is neither rational analysis nor the empirical 
treatment of actual data. We make an empirical study using hypothetical 
data designed to meet our conditions and to represent the type of item used 
in the Navy pitch test. We define the following variables: 


x, The true ability of persons on the underlying variable of our test. 
x is normally distributed, with a mean of zero and an s.d. of 1. 
x constitutes our criterion. 

y; The scale-value of item 7, expressed on the same scale of ability as 
x. 67 per cent of the persons whose true ability is y; pass the item 
(50 per cent after correction for chance). 

oz The standard deviation of the normal curve whose integral is the 
ogive that defines p;., as a function of x. p;., is the probability that 
a person with true ability x will pass item 7. h, the psychophysical 
measure of precision, is~/2/20.. og expresses that error of measure- 
ment which results from lack of precision in a single observation, 
apart from guessing. 


Our steps are as follows: 


TABLE 1 
Relation of og to Item Reliability 














Correlation with similar Correlation with similar 
item if probability of item if probability of 
chance success by guess- chance success by guess- 
ing is zero ing is one-third 
od 
T tet phi* Tor phi* 
0 1.00 1.00 .70 .50 
2 94 .80 55 37 
5 .82 .60 .40 20 
1.0 .50 34 25 a yg 
2.0 .23 13 12 .07 





*Values in these columns will vary according to item difficulty. These 
computations are based on items where y = 0. Our computational procedure 
was to determine the probability that a person with a given criterion score 
would pass both, one only, or neither, of two items with the same precision and 
y = 0. These conditional probabilities were multiplied by the probability 
of each x value, and summed to form a four-fold table from which the coeffici- 
ents could be determined. 











LEE J. CRONBACH AND WILLARD G. WARRINGTON 13] 


Step 1. The ogive p;.. = f(y; — 2x) is determined for five values of 
oz, namely 0, 0.2, 0.5, 1.0, and 2.0. Because guessing as well as lack of pre- 
cision lowers the item intercorrelation, oz is related to item reliability as 
indicated in Table 1. In practice, it should be realized, item correlations are 
ordinarily low. Hence o, will rarely be less than 0.5. The following steps use 
one value of o, at a time. 

Step 2. The conditional probability p;., is known from Step 1, for any 
value of y — x. The conditional probability that the person will earn any 
score s from zero to n on a test of n items of uniform difficulty is obtained 
by the binomial theorem. 

Step 3. The expected relative frequency of any score x is known from 
the normal curve. Multiplying the conditional probability of any score 
D..2 by pz gives the joint probability p,, . Arraying these against s and x 
gives the bivariate distribution of score and criterion. 

Step 4. The validity coefficient (see below) is determined for each 
cutting score. 

Step 5. To study tests not uniform in difficulty, short component tests 
were constructed in which y was held constant (Step 2). The conditional 
probability matrices for these components were then multiplied to get the 


TABLE 2 


Test Patterns Studied, Described in Terms of Number of Items at Each Scale Position 








Length Scale Value Pattern A Pattern B Pattern C Pattern D 
of Test (y) 
+2.5 
+2.0 
+1.5 
+1.0 
+0.5 6 
30 0 30 18 
—0.5 6 
—1.0 
—-1.5 
—2.0 


DADA 
WWwWwwww ww iow 





+0.5 
18 0 18 
—0.5 6 


Oo 





+0.5 6 
12 0 12 
—0.5 6 





6 0* 6 





*6-item tests were also studied for scale values y = +1.0, +0.5, —0.5, and —1.0. 








132 PSYCHOMETRIKA 


conditional probability of any desired combination. Then Steps 3 and 4 
were followed. For example, one thirty-item test was constructed by multi- 
plying matrices for five six-item peaked tests. 


Table 2 lists the patterns examined. In choosing patterns of item diffi- 
culty for study, an attempt was made to include a wide variety without 
unduly increasing computation. Pattern A, the ‘‘peaked’’ test, is examined 
at all lengths and values of o, . Pattern B is moderately peaked, Pattern C 
less so, and Pattern D has virtually a flat distribution of item scale values. 
It is perhaps more common to think of item difficulty in terms of p; . After 
correction for chance success, p; has a non-linear relation to y; ; where 
oz = 0, the relation is as follows: 


y +38 42 +1 0-1 -2 -3 
p 0.1 2 16 50 84 98 99.9 


For less precise items (c, larger) p is smaller than these values when y < 0 
and greater when y > 0. Thus for Pattern C, og = 0, the distribution of 
item p’s is U-shaped, with many very hard and very easy items. It happened 
that the pitch test, based on a physical scale, had items spaced according 
to scale units rather than item p’s. In any case our patterns are sufficiently 
consecutive that inferences regarding intermediate patterns are not hard to 
make. 

Evaluation function. To judge the screening efficiency of a test, one 
thinks first of the phi coefficient and biserial r. Men are to be dichotomized, 
and errors of classification are to be avoided. Where the scores are to be 
used in no other manner, there is no need to discriminate among men well 
above or below the point of cut. Suppose F per cent of the men are to be 
eliminated. Phi might be determined by dividing the x scale at that point 
above which F per cent of the men fall, dividing the score scale similarly, 
and computing from the four-fold. Biserial r would use the continuous z- 
scale with the dichotomized s scale. The difference is that biserial r weights 
errors of classification according to their distance from the point of intended 
cut (i.e., their “seriousness’’). Phi is closely related to the number of misses 
and false positives, being the ratio of actual “hits in excess of chance ex- 
pectancy”’ to possible “hits in excess of chance.”’ Preliminary studies showed 
that conclusions for phi and for 7,;, differ negligibly. 

Unlike product-moment 7, 7,;, is independent of the test score metric. 
Our results are therefore invariant as test scores are transformed to other 
scales. It was thought possible that transformations of the test score dis- 
tribution would seriously alter the product-moment correlation of the test 
with a normally distributed criterion. As another part of our inquiry, we 
have determined validity in terms of 7. Lord uses as his evaluation function 
an index which is not invariant under transformation of scores but which, 











LEE J. CRONBACH AND WILLARD G. WARRINGTON foo 


unlike our functions, is independent of the range of ability in the population 
tested. 

The reader should note that we have used a sort of index of reliability, 
the correlation of test score with normalized true score, as a validity coefficient. 
This is legitimate for a test of pitch, a very pure function, where true threshold 
is a good criterion. In the case of a test where items contain a common factor 
other than the criterion factor, which is ruled out by our assumption (1), it 
would not be correct to regard our 7,;, as a validity coefficient. 


Results: Screening Validity 


Figures 1 and 2 present the results of our computations for thirty-item 
tests with varying patterns and varying oz . Each curve shows the screening 





1.00 






















“A ‘ c 
ae a i 4 “3 b PATTERN A ———m 
90 pane mee. aa ae Senet” T PATTERN B BERD 
a oo Siaccancocecenee — ' 
& pa 7 \ \ | ~ 
‘d a, \ x r 
“ / . ta 
a \ \ J r 
7 0 / a . ‘ § 
we —, 
/ 7 7 r 
g , , 
60 rm rd = F 
aS a \ 
= ys ff ae F Y; \ f 
> SOL/ 30 ITEMS a | 4 
a j 50 100 
| o4= 9 4 
a .40 ms 
< ¥ : 
si y = pattern C i 
' 
oe athe oui 
—_ Ps — \ ! 
a s Ss 
as PATTERN A — | 2 100 
’ PATTERNS ——-—— 
10-— PATTERN C ———= —1 
i PATTERN D ------ | 














1 i 1 1 ] 1 1 1 1 
10 20 30 40 50 60 70 80 90 
PER CENT OF CASES BELOW CUTTING SCORE 


Fiaure 1 
Validity Curves for Various Patterns 


50 
SCORE (PER CENT) 


validity at each possible cutting score, for one test. For a test having a 
limited number of items, validity is lower when a cut is made at any per- 
centile point not falling midway between two items. This would give our 
curves a “scalloped” effect, and has not been shown in these charts as we 
are concerned with main effects. In Figure 4 an unsmoothed curve showing 
these dips is plotted. 

In Figure 1, we see that validity of the peaked test varies considerably 








134 PSYCHOMETRIKA 



















































4 
7 1 4 
£0} \4 
2 alt 30 ITEMS b 3 30 ITEMS 
> ie o4g7.2 4 5 Ot5 
2 
77 7 Z 
> 4 > 
sor- 4 
20;-- eo 
dg otto 6 ——— 7 
10 parTem ¢ ----= onl 
Se L . = ‘ 7 
10 20 30 40 $0 60 70 80 90 
PER CENT OF CASES BELOW CUTTING SCORE PER CENT OF CASES BELOW CUTTING SCORE 
100 100 
ee 4 4 
90 iia J ad a 
a 
80 _ 80 — 
70 4 70 a 
: 4 a 
60 4 60;-—- = 
a 30 ITEMS a 2 ol. 30 ITEMS a 
E L 0% =1.0 4 5 r o, #20 7 
3 4 i 7 40-—- = 
3 4 > ~ “ 
30} - a dr 4 
20)-- ITE A me 20/- es ae 
Fs wmE—— 4 b ATER ea — — | 
toh ommeconoot sol— armme === 
4 1 +h 
10 20 30 40 $0 60 70 860 90 10 20 30 40 $0 60 7060 60 906 
PER CENT OF CASES BELOW CUTTING SCORE PER CENT OF CASES BELOW CUTTING SCORE 
FIGuRE 2 


Validity Curves for Various Patterns 


with change in cutting score. Thinking of the validity curve in relation to 
the marginal curve showing distribution shape, we see that screening validity 
is high if a cut is made at the trough of the bimodal distribution. But half 
the people, whose ability is below the scale value of the items, pass items only 
by chance, and any cut which chooses among the lower half of the cases does 
so only by chance. Hence this peaked test with perfectly reliable items 
(except for the effect of chance) is inflexible, and unsuitable for a situation 
where the selection ratio may change. As the items vary more in difficulty, 
the maximum validity begins to drop but the validity stays nearly constant 
over an increasingly wide range of selection ratios. 











LEE J. CRONBACH AND WILLARD G. WARRINGTON 135 


Considering now the additional facts offered by Figure 2, we have these 
generalizations: 

1. For tests of at least moderate length, the distribution of scores is 
bimodal for very precise items of uniform difficulty. An increase in either 
o4 or o, (or both) causes the distribution to become unimodal and ultimately 
normal. (The distributions demonstrating this conclusion are assembled in 
Figure 6.) 

2. The validity curve for a more peaked test rises above the validity 
curve for any less peaked test for some selection ratios in the middle of the 
scale, but falls below for very high or low selection ratios. This suggests that 
we examine the “range of advantage” of one test over the other, as we do 
in Table 3. 


TABLE 3 


Superiority of Peaked Test A over Pattern B and Pattern C, 
for Five Degrees of Item Precision 








Range of Advantage Range of Advantage 





” of A over B* of A over C* 
0 50 50 
2 40 to 62 38 to 63 
5 34 to 82 34 to 82 
1.0 19 to 99 (or 100) 19 to 96 
2.0 02 to 99 (or 100) 02 to 99 (or 100) 





*Percentage-of-men-rejected for which biserial validity for A is 
better than for less peaked test. 


3. The range of advantage of the highly peaked test over other patterns 
is very small for precise items but increases as oz increases. For values of 
oa expected in practice, the sharply-peaked test A is superior to the other patterns 
over a wide range of selection ratios. Hence the peaked test is not too inflexible 
for practical utility 

4. The magnitude of the gain in validity at the maximum, as a result 
of reducing the range of items, is much greater for highly reliable items than 
for unreliable ones. While decreasing the range of item difficulty increases 
validity at most cutting scores when oz, > 1.0, the increase is very slight in 
amount. 

These results completely confirm Lord’s conclusions 5 and 6. Results for 7 
will be discussed in a later section. 

Validity and test length. In Figures 3 and 4 we present the validity 
functions for a few shorter tests. These results indicate that the conclusions 
advanced above are in no way dependent on test length. We note the in- 
teresting fact (Figure 4) that increasing the length of a peaked test where 





136 


VALIDITY (ry) 


VALIDITY (ros ) 


PSYCHOMETRIKA 











12 ITEMS al 
C4 =5 4 


PATTERN A ——— 


ATTRAC — 7 








1 . . ‘ i . ‘ 1 1 
10 20 30 40 $0 60 70 80 90 
PER CENT OF CASES BELOW CUTTING SCORE 











18 ITEMS a 
04 =.5 7 


PATTER A ——— 


PATTERN Gm <—-— 








1 1 1 1 1 1 1 1 1 
10 20 30 40 50 60 70 80 90 


PER CENT OF CASES BELOW CUTTING SCORE 
FIGuRE 3 
Validity Curves for Shorter Tests, Pattern A vs. 








varrenn A 





Pattern C 











VALIDITY (ras) 


VALIDITY (r,,,) 





LEE J. CRONBACH AND WILLARD G. WARRINGTON 











PATTERNA ~~. | ¢ 





.30r- 
0420 ae “4 
x 
-20;-—- times =se Soa 
sites — — 4 
10 6iteus ——— 5 


SCORE (PER CENT) 








A . 1 ‘ r 1 ‘ 1 : 
10 20 30 40 80 60 70 80 90 


PER CENT OF CASES BELOW CUTTING SCORE 

















PATTERN A 
O4 +5 al 













1 1 L rn l rn 1 4 1 i 
10 20 30 40 50 60 70 60 90 36 0 
SCORE (PER CENT) 
PER CENT OF CASES BELOW CUTTING SCORE 
FIGURE 4 


Changes in Validity Curve of Peaked Test with Increasing Length 





Ss 








138 PSYCHOMETRIKA 


oz = 0 does not raise reliability or validity noticeably after the test distribu- 
tion is separated into two portions. The validity curve for 30 items would 
fall almost identically on the six-item curve in Figure 4. 

Validity related to item precision. Figure 5 shows validity curves for 
varying o, , pattern being constant in any chart. The results resemble in 













A 
‘8 
8 









ao 
er 
Pre ee 


gs ss 














































\ wy \+ 
3 \ Y 
£ 30 ITEMS \ > a 
5 PATTERN A ‘J 3 PATTERN B 4 
= = Z 40 4 
z 7 > +F 4 
30f- = 
4 - yom 7 
“ = 2 —— 7 
_ He og" core 7 
n=: : 497 apace 
10 20 30 40 50 — 1 2 0 40 $0 60 70 60 
PER CENT OF CASES BELOW CUTTING SCORE PER CENT OF CASES BELOW CUTTING SCORE 
iy 
10h" : BY 
so 7 ol - a 
2 sof 30 ITEMS . E sol- 30 {TEMS 4 
= . PATTERN GC 4 ef PATTERN O 4 
S g a0}- el 
aw 7 3} 
2 r ; 4 
30}- ~ — 7 
5 4 20}— ogO—— 
20\— qgo=—_ — sem 3 
& @2—— a 10}— Ogr2@D---- Lf 
40}—- Ggbn---- 14 a J 
- , nel oo ee 10 80 304050 60 70 60 90 
10 20 30 40 80 60 70 80 90 PER CENT OF CASES BELOW CUTTING SCORE 
PER CENT OF CASES BELOW CUTTING SCORE 
Figure 5 


Validity Curves for Varying Item Precision 


some ways the curves that come from varying pattern: the tests with greater 
precision are most valid for intermediate selection ratios but have inferior 
validity at the extremes. The chart for Pattern D seems to contradict this, 
but only because the curves are incomplete at the very ends. Markedly 
increasing item precision reduces the flexibility of a test for screening without 

















LEE J. CRONBACH AND WILLARD G. WARRINGTON 139 


any appreciable gain in validity at the point of greatest efficiency. This 
conclusion, which agrees with Tucker’s (6), we shall discuss further below. 

It is strange that improving individual items should lower validity. The 
explanation of our paradox is this: If an item has perfect precision, it gives 
no information about which of the men whose criterion score is below y; are 
best. All of these men will have the same score (zero) on a group of perfectly 
precise free-response items, if guessing is impossible. If each item allows 
two or more choices, the scores will vary but the differences will not be related 
to ability. Since the obtained scores are equal or differ only by chance, the 
test does not discriminate among low-ability men having different criterion 
scores. Likewise the peaked test gives no information about individual differ- 
ences within the high-ability group, whose thresholds are above the scale- 
position of the items. In a less precise item, the proportion passing is a sloping 
function of criterion score, and a man whose ability falls slightly below the 
scale position of the item will tend to earn a higher score than the man who 
is far below the scale position. Each item contributes information along the 
whole scale. Hence in the peaked test, less precise items do discriminate 
better than precise items at all cutting scores except where the precise item 
has maximum validity. 


TABLE 4 


Validity at Two Cutting Scores as a Function of Precision and Pattern 














Cut to eliminate poorest 50% Cut to eliminate poorest 90% 
Ca A B C D Ca A B C D 
0 1.00 1.00 .96 .895 0 .44 .63 .84 .98 
2 41.00 1.00 .98 .2 57 .73 95 
5 99 .985 .975 .89 1B .80 .84 .96 91 
1.0 96 305 < 294 1.0 93 .92 .92 
2.0 B45 .84 .83 .79 2.0 84 .83 .82 .79 





The highest validity in each row is italicized. 


As Table 4 indicates, validity of a test at extreme cutting scores may 
be increased by increasing either og or o, up to a certain point. It appears 
that when we have inaccurate items we gain validity over nearly the whole 
range by peaking the item difficulty distribution. When we have very ac- 
curate items, greatest validity over the range is obtained with a spread of 
item difficulties. 

A comprehensive interpretation. Several of our results indicate that o, 
and o, have much the same effect on score distribution and validity. This 
effect would be even more striking if we had not, for ease of computation, 
used a discontinuous pattern of y’s while letting error of measurement be 
continuous. As it is, in Figure 5, the curve for Pattern A, oz = .5 (o, = 0), 








140 PSYCHOMETRIKA 


falls exactly between the curves for Pattern B, o, = 0 (co, = .3), and Pattern 
C, oa = 0 (co, = .7). We are indebted to Dr. R. 8S. Gales for putting us on the 
track of an explanation for this relation. 

When o, > 0, a single person sometimes passes an item, sometimes fails 
it. This may be regarded as sheer inconsistency, but in looking for deeper 
causes we can describe it as a variation in the person’s instantaneous threshold. 
In the pitch test, the person’s threshold presumably varies as a result of 
fluctuations in the physical or electrical states of his ear and brain, of fluctua- 
tions in alertness, and so on. His “internal noise level’’ changes. In effect, 
the difference between signal (test item) and noise level (threshold) changes 
just as it would if his ability remained perfectly stable and the stimulus 
changed in discriminability. Another source of unreliability might be varia- 
tion in signal received, where items are supposed to be equally difficult. A 
noise made by a neighbor, a momentary distraction, or a variation in turn- 
table speed would cause an item to be harder than others of supposedly 
equal scale value. These variations, too, alter the difference between mo- 
mentary item scale value and momentary threshold. The effect of this shift 
iN Yi: — pe (the ¢ indicating value at a particular time) is to shift the ogive 
p: = f(y — =) to right or left. There is no operation by which we can dis- 
tinguish, for a single person, between changes in scale value of item presented, 
in scale value of item received, and in threshold of the person. Hence on any 
test we would expect identical probability of passing any item or attaining 
any score, and consequent identical validity, whichever of these varies by 
a given amount. 

An increase in o, (variation in true item difficulty) has the same conse- 
quences as an increase in o, (variation in performance from trial to trial), 
so long as we are dealing with homogeneous items. Therefore we may seek 
generalizations in terms of these interrelated phenomena. 

When o, = 0 and oz = 0, we have a test which is highly precise and 
highly specific. It gives perfect information as to the classification of persons 
around a single cutting score (as soon as enough items are used to separate 
scores attained by chanee from earned scores). It gives no valid discrimination 
within either the superior or the inferior group. Such a test measures, not 
pitch-threshold on a continuous scale, but ability to pass a particular level 
as an all-or-none quality. There is no advantage in lengthening such a test 
beyond a small number of items. 

When total variance in relative threshold (oj + 0%) increases, the test 
becomes gradually less specific and less precise. In the usual testing situation 
the quality of items is hard to improve after care has been used in preparing 
them, and o, may therefore be regarded as given. This fixes the upper limit 
of specificity and the upper limit of validity for the best cutting score and 
given test length. Considering Figure 2, o; = .5, as an example, the curve 
for Pattern A (c, = 0) shows the maximum validity for this type of item, 

















LEE J. CRONBACH AND WILLARD G. WARRINGTON 141 


and shows that validity holds up well for cutting scores which reject from 
35 to 80 per cent of the group. If such flexibility is enough for one’s purposes, 
he can do no better than to use the peaked pattern. If he must have greater 
flexibility, he can increase o, and so obtain validity at more extreme cutting 
scores, thereby sacrificing to a greater or less degree the validity at the 
point of greatest efficiency. 

This rationale suggests that we can clarify our generalizations by ex- 
amining the data in terms of the variance of relative thresholds within 
persons. This is 07 + o, , the variance of the difference between the person’s 
momentary threshold and the item scale value over all items and over many 
trials. Applying the concept to our materials, we present in Table 5 the 
variances for each of the thirty-item patterns of this study. 


TABLE 5 
Variance in Relative Threshold for Various Test Characteristics 








Pattern A Pattern B Pattern C Pattern D 





Oa oy = 0 oy = .3 oy = A oy = 1.4 
0 0 .10 .50 2.06 
0.2 0.04 14 54 2.10 
0.5 0.25 35 By fis 2.31 
1.0 1.00 1.10 1.50 3.06 
2.0 4.00 4.10 4.50 6.06 





In Figure 6, we have arranged the relative score distributions for eight 
tests, selected to show a regular increment in oj + o; from .00 to 6.06. It 
is seen that the curves progress regularly from a bimodal distribution through 
a relatively flat distribution toward a normal distribution. Any departure 
from the regular trend arises because changes in item difficulty were dis- 
continuous, whereas errors of measurement were normally distributed at all 
times. 

When we examine the validity curves of Figures 1 and 2 in order of 
variance in relative threshold we find a regular progression over the entire 
series of twenty curves (two not actually computed). The progression shows 
the curves to form a single family, save for discrepancies introduced by the 
discontinuity of item scale values. We can now state our major generalization 
as follows: For three-choice items representing a single ability in a test of 
fixed length and fixed average item difficulty, as 0; + o; increases, 


1. the validity of the test at its point of greatest efficiency decreases; 

2. the validity has nearly the maximum value over an increasing range 
of selection ratios (i.e., the test is more adaptable to new selection 
ratios). 








142 PSYCHOMETRIKA 


















































E Og 
rT A 
ft m 
r r Oy 21.0 
FA 
r rr 
ee rAN ev is 1 a ae a ee ee 
50 100 50 100 
F Og=5 : _ 
L. A t 5 
ff ; ft 
a r] i 
- 4 -/ - eer PS, 
i 4 Nemes a i ™~ 
Pe. | ™ n rn ee | ery ie wee 1 ee 
50 100 50 100 
- Oy=-2 0470 
© D 
r f 
ft 
5 ee 
T ae n | 1 1 L 4 J 1 A. = 1 4 4 
50 100 50 100 
: O42 Og=2.0 
i D 
ft f 
i | Pe 
i poor! / ~ 
ee a ee le —_ 
50 100 50 100 


FIGURE 6 


Distributions of Scores, Expressed as Per Cent of Possible Score, for Patterns Representing 
Increasing Variance of Relative Threshold 











LEE J. CRONBACH AND WILLARD G. WARRINGTON 143 


For relatively unreliable items such as are normally encountered in practice, 
sufficient flexibility is guaranteed by the magnitude of «2, , and uniform item 
difficulty is advisable in order to minimize 07, + 0% and thereby maintain validity 
as high as possible. 


Optimum Item Difficulty for a Given Selection Ratio 


Where chance is not a factor, a test is expected to be most efficient at 
a selection ratio which corresponds to the proportion passing the items. 
For the three-choice test, however, the maximum validity is usually found 
to the right of the mean scale position of the test items. In the tests so far 
discussed the mean y has been 0*, which corresponds to a selection ratio 
that fails 50 per cent of the men. The rightward displacement cannot be 
determined accurately for the flat-topped curves, but it is seen, for example, 
that Pattern A when o, = .5 has its maximum validity when about 57 per 
cent of the men are rejected. (Cf. Lord’s conclusion 4.) 

This rightward displacement is shown especially by the series of curves 
in Figure 7 where five peaked tests, each six items long, are analyzed. The 
tests represent five levels of difficulty, and in each case the maximum validity 
is to the right of the scale position of the items. 

The displacement is explained by the fact that guessing increases as 
men become less able. The scores of poorer men are therefore more subject 
to error of measurement, and the test discriminates less accurately among 
these men whose scores contain more variance due to guessing. This addi- 
tional error at the left end of the scale pulls down the validity in that region, 
and so shifts the maximum of the curve rightward. This finding is also re- 
ported by Lord (4, p. 35) and Denton (personal communication). 

In the case where item precision becomes very high, a different effect 
can be observed. A very large proportion of the cases earn a perfect score, 
and the test cannot discriminate among them. This is illustrated in Figure 4. 
In the curve for the single item with o, = 0, two-thirds of the cases earn a 
perfect score and the test cannot make a better discrimination to the right 
of the 33% cutting point than it does at that point. Hence in this case the 
introduction of chance moved the maximum of the validity function to the 
left. Such a leftward movement occurs when the proportion of cases earning 
the maximum raw score exceeds the proportion whose criterion score is above 
the scale position of the items. 

Our results are summarized in the conclusion that the rightward shift 
is greater as the proportion of variance due to guessing increases. Except 
for the effect of piling up at a single score just discussed, the shift is greater 
as the probability of guessing correctly on a single item increases (i.e., when 
the number of choices per item is reduced). For any given number of alterna- 
tives the shift to the right is greater when 

*Except for 30-item Pattern D, where the mean y is +0.25. 








144 PSYCHOMETRIKA 












































1.00 
ln) 
7 \ 
/ \ 
inf \ “ 
NN 
4 N 
\ a 
i he 
50 me 
be ~ 4 
. 
Ry 
Y=-10 ‘ 
1 i t i 
50 100 
\ LOO 
a | -° - 
a \ a ry 
¥ . , ? .4 
ry \ 7 4 . 
4 \ ae ‘ 
/ ~. 4 Pe 
50 \ as 
8 % ; ag e 
» =—0.5 7 =+0.5 
iT i i 1 _f i i i 
50 100 50 100 
100 rr 
Pan 
: a \ 
‘ Tt 


\ 




















50 50 F 
‘ 4 5 _ 3 
Ye 0 °° il Y #+1.0 
: : 50 ' 100 ' 50 ; , 100 
FIGURE 7 


Validity Curves for Peaked Tests of Varying Difficulty 


Each test is based on six items, Pattern A jog = .5. Horizontal axis shows selection ratio 
(per cent failed). Vertical axis shows biserial r. Arrow indicates scale value at which 
test is peaked. 














LEE J. CRONBACH AND WILLARD G. WARRINGTON 145 


(a) the items become more difficult for the group tested, and/or 

(b) the test is shorter, and/or 

(c) og + o, increases (the items vary more in difficulty, and/or the 
items have less precision). 


Our data include only three-choice items, and the basic condition is not 
verified here, but the sub-conditions are confirmed in the charts presented 
(esp. Figs. 1, 4, 5, 7). 

In order to design a test which rejects the poorest F per cent of the men tested, 
items should on the average be located at or above the threshold for men whose 
true ability is at the Fth percentile. These men are at the borderline on the 
criterion. Then according to our results these borderline men should have a 
fifty per cent, or more than a fifty per cent, chance of passing the mean test 
item, after correction for chance. That is, test items should ordinarily be 
easier than items at the threshold of these borderline men. Just how much 
easier depends upon the factors listed in the preceding paragraph. An ex- 
ception is the case where many men pile up at a single score as discussed 
above. 

This recommendation departs from the published view that the pro- 
portion passing an item should correspond exactly to the proportion intended 
to be selected. Our rightward shift, which dictates use of items easier than 
the threshold for borderline men, results from our use of three-choice items. 
Previous theory has always been qualified with the comment that it applies 
only to free-response items where chance success is not allowed. Lord has 
pointed out another factor, however, which casts doubt on the published 
generalization in the free-response case (4, p. 33-35). 


Results: Over-all Validity 


While our study. was designed to resolve questions regarding screening 
validity, it is also of interest to evaluate the over-all correlation of scores with 


TABLE 6 


Correlation (n) of Test with Criterion 











Pattern 
Ga 
A B C D 
0 .80 i = .90 
2 Tf 91 .94 
rt) .92 .92 .93 .88 
1.0 .92 .91 .91 
2.0 .82 .82 .81 sae 





*Computed as greater than .86; probably near to .91. 
**Computed as greater than .91; probably near to .94. 








146 PSYCHOMETRIKA 


criterion. 7, which is substantially the same as the correlation of normalized 
test scores with the normal criterion, indicates how much the test can con- 
tribute in prediction and in multiple correlation. Our data were not in a 
form which permitted accurate computation of 7 for some tests, because of 
use of broad categories. For this reason, two of the entries in Table 6 are 
uncertain. 

As in Tucker’s study of a peaked test (6), validity is seen to increase 
and decrease as item precision rises. Insofar as we can compare the present 
data with Tucker’s, the introduction of fixed responses and resultant guessing 
lowers validity for higher values of o, . Apparently use of » rather than r 
had no substantial effect on validities, but a firm conclusion on this cannot 
be offered. 

When data in Table 6 are organized according to o3 + o; , a clear func- 
tional relation exists: 


oi; +o, 00 04 .14 .25 .54 1.00 1.50 2.06 4.00 6.06 
o) 80 87 91 92 94 92 91 90 82 .77 


Because the extremely peaked test of precise items does not discriminate 
along the entire scale, for any given length of test there is a value of «7 + o? 
which gives maximum validity. For the thirty-item test, this maximizing 
value is about .50. Hence if items have low to moderate precision so that 
o; > .50, the peaked test will have greater validity than any other pattern. 
Insofar as we can judge from these and Tucker’s results, the peaked test 
has superior validity for even lower values of o, (higher 7;;) when the test 
is longer than thirty items. 

When o; is less than the maximizing value, there is a value of o, which 
maximizes 7, i.e., an ideal degree of peaking for the given item precision and 
number of items. The curve of validity as a function of oj + o? has a very 
small slope, which implies that precise determination of the maximum is of 
little importance. 


Summary and Conclusions 


We have examined the validity (7,;, and 7) of a univocal test in which 
each item has three alternatives. Our findings, in general, apply also to tests 
of the free-response type, but further evidence is needed regarding tests 
where items contain a factor other than the criterion factor. 

Validity is found to depend on the quantity 02 + 07, where o, isa measure 
of item precision and a, reports the spread of item difficulties. As this variance 


increases, 


1. validity (n) increases up to a maximum value and then declines. For 
a thirty-item test, the maximizing variance is about .50. 
2. the distribution of scores becomes more nearly normal. 














LEE J. CRONBACH AND WILLARD G. WARRINGTON 147 


3. the screening validity (r,;,) of the test at the selection ratio where 
it has maximum efficiency decreases. 

4. the screening validity has nearly the maximum value over an in- 
creasing range of selection ratios. 


The selection ratio at which a test is most efficient depends on the 
difficulty of items. In order to design a test which rejects the poorest F per 
cent of the men tested, items should on the average be located at or above 
the threshold for men whose true ability is at the Fth percentile. 

In view of the fact that items ordinarily used in mental tests have rather 
low intercorrelations, so that oz is usually greater than .5, we conclude that 
narrowing the range of item difficulty will generally have beneficial effects 
on the validity of tests. This will maximize 7 (unless the test is very short 
or the items unusually precise), and will allow increased validity at the 
best cutting score without greatly sacrificing validity at most other cutting 
scores. Constructors of educational and psychological tests would be wise to 
make item difficulty constant in most of their tests, since this lowers validity 
only for persons having extremely high or low ability. 


REFERENCES 


1. Brogden, H. E. Variation in test validity with variation in the distribution of item 
difficulties, number of items, and degree of their intercorrelation. Psychometrika, 1946, 


11, 197-214. 
2. Carroll, J. B. The effect of difficulty and chance success on correlations between items 


or between tests. Psychometrika, 1945, 10, 1-19. 

8. Gulliksen, H. The relation of item difficulty and interitem correlation to test variance 
and reliability. Psychomeirika, 1945, 10, 79-91. 

4, Lord, F.M. A theory of test scores and their relation to the trait measured. Res. Bull. 
51-13, Educational Testing Service, 1951. See also A theory of test scores. Psycho- 


metric Monograph No. 7, 1952. 
5. Richardson, M. W. The relation between the difficulty and the differential validity 


of a test. Psychometrika, 1936, 1, 33-49. 
6. Tucker, L. R. Maximum validity of a test with equivalent items. Psychometrika, 


1946, 11, 1-13. 


Manuscript received 8/6/51 
Revised manuscript received 10/8/51 








PSYCHOMETRIKA—VOL. 17, No. 2 
JUNE, 1952 


FINITE MARKOV PROCESSES IN PSYCHOLOGY* 


GEorRGE A. MILLER 
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 


Finite Markov processes are reviewed and considered for their usefulness 
in the description of behavioral data. The various alternative responses in 
an experimental] situation define a vector space, and changes in the probabili- 
ties of these alternatives are represented by movements in this space. Meth- 
ods of fitting the theory to experimental data are considered. 

The simplest process, with a constant matrix of transitional probabilities 
that is applied repeatedly to represent the effect of successive trials, seems 
inadequate for most learning data. A matrix function that may be usefu! for 
learning theory is presented. 


In the two general areas where psychology has been relatively successful 
as a quantitative science, i.e., sensory psychology and test construction, 
probabilistic considerations long ago proved their worth. It is characteristic 
of these two areas, however, that the observations are relatively invariant 
in time. The basic parameters can be explored at length because sequential 
effects of measurement are secondary and can be ignored or randomized. 
This fortunate situation makes it possible to use familiar probability models 
based upon independent random variables. 

With the more dynamic problems of psychology, however, this familiar 
model has not often led to profitable results. For example, it is intrinsic in 
the very notion of learning that successive measurements are not inde- 
pendent; attempts to use a theory of independent variables must either fail 
or misrepresent the basic process. Such failures may lead to a rejection of 
statistical concepts as inadequate; a more proper attitude is to abandon the 
assumption of independence and ask what help can be had from dependent 
probabilities. The simplest mathematical models incorporating dependent 
probabilities are the finite Markov processes. In this paper such processes 
are examined for their usefulness and their limitations for describing psycho- 
logical data. 


1. Simple Markov Chains with Two Alternatives. The data from psycho- 
logical experiments usually come in the form of sequences of choices em- 
bedded in the time continuum. Often it is possible to ignore the temporal 
order in which alternative choices occur. The purpose of this discussion, 


ra  *This article was written at the Institute for Advanced Study in Princeton, New 
Jersey, while the author was on sabbatical leave from Harvard University. 


149 








150 PSYCHOMETRIKA 


however, is to examine situations in which the temporal sequence should 
not be ignored. We shall adopt the Markovian model of dependent prob- 
abilities to discuss such sequences. We begin, therefore, with the simplest 
possible example of a Markov chain. 

Consider an experiment in which only two alternative responses are 
possible. A trial consists of a choice of one of these two alternatives. If the 
letters A and B designate these choices, then a sequence of trials might 
produce the sequence of responses ABBAAABA ... , where the durations 
and latencies are ignored. We shall assume that this sequence is produced 
by a Markov process; i.e., that the distribution of probabilities at trial n + 1 
depends upon the outcome of trial n. However, the knowledge of outcomes 
prior to n does not change our description of the system if we know the 
outcome of trial n. In other words, the present state of the system governs 
its future development. 

We adopt the following notation: 


n number of the trial: 0, 1, 2, .... 

Aand B the two alternative responses. 

p”(A) _ probability of alternative A at trial n. 

p(A) asymptotic value of p"’(A) asn >. 

d,, the set of absolute probabilities at trial n, considered as 
a vector; [p'"’(A), p’”’(B)]. 

pa(B) given A at n, the conditional probability of B at n + 1. 

ps (B) given A at n, the conditional probability of B at n + m, 


m= 2, 3, ; 
i matrix of transitional probabilities. 
r; characteristic roots of the matrix 7. 


Alternative A can occur at trial n + 1 in either of two ways. Either it 
follows an A on trial n, or it follows a B on trial n. Similarly, B can oecur 
at n + 1 in either of two ways. This obvious fact leads to the following 
equations: 

p’”’(A)p,(A) + p’”’(B)ps(A) = p*"(A) (1) 
p’”(A)ps(B) + p’(B)p,(B) = p”*” (B). 


In matrix notation these equations can be written 


\pa(A) pu(A)| \p ne: R ee. (2) 


lp4(B) »,(B)S hs " (B) p”*” (BY) . 


The reader is assumed to be familiar with the elements of matrix theory. 
If the distribution of probabilities on trials n and nm + 1 is regarded as the 


vectors d, and d,,,, in a two-dimensional space, then the square matrix of 














GEORGE A. MILLER 151 


transitional probabilities is a linear transformation or operator mapping d, 
into d,,, . Thus we can write Eq. (2) as 


Td, = dns; . (3) 


Any sequence of distributions can be produced by operating upon the 
successive d; by appropriate transformations. For the moment, however, we 
shall consider a special case. We shall assume that repeated trials can be 
represented as repeated transformations by the same operator. Thus we 
can write for the initial trial: 


Td, 


d, . 
A second trial carries d, into d. : 
Td, = d.. 
In terms of dy , therefore, we can write: 
Td, = T(Td,) = Td, = d, . 


Or more generally, 
(ber = d, ° (4) 


Since the probabilities of A and B on successive trials are given by 
Td, , we proceed to examine the powers of 7’. The elements of 7” are p;" (J), 
where 7 = A,B; 7 = A,B. We wish to find a general expression for 7” in 
terms of p,(j) and n. From matrix theory we know that every square matrix 
with distinct roots is similar* to a diagonal matrix whose diagonal elements 
are the characteristic roots A; of 7. We designate this similar diagonal matrix 
by A, and write 

A = S 'TS, 
where S is a matrix whose columns are the characteristic vectors of 7. From 
this we obtain 

1 = SAS - 
To obtain the powers of 7' we note that 


a = SAS "SAS ; = SA'S ; 


» 
or more generally, 
7 =o Sars Re (5) 


Powers of A are simply calculated, for since A is a diagonal matrix, its powers 
are given by the powers of the diagonal elements \, . 

To find A for the matrix of Eq. (2) we first write the characteristic 
equation for the matrix 7. If we use the fact that pa(A) + pa(B) = 1 (and 


*Two matrices are said to be similar when they have the same characteristic roots. 








152 PSYCHOMETRIKA 


similarly for B subscripts), the determinantal equation can be written in the 
convenient form 


det (7 — I) = d° — [pa(A) + pa(B))A + [pa(A) — pa(A)] = 0. 
The roots of this equation are the characteristic roots of the matrix: 
Ay = 1 and Ao = pa(A) as pe(A). 


Since the sums of all the columns of 7' are unity, we note that unity is always 
a root of these matrices. Substituting these roots into Tv; = \,v,; and solving 
for the characteristic vectors, v; , we obtain the vectors [1, p4(B)/ps(A)] 
and (1, —1). These vectors comprise the columns of S, and so from Eq. 
(5) we obtain, after inverting S, 


1 1) (1" 0 1 
[= 
a s [pa A) — om: ili 


P(A) 
hem ss 6) 
p(B) —pa(A) 





Eq. (6) can be written more conveniently 
-_ | 
pa(B)  pa(B) 


pad) = pa} palB) baie (7) 
pa(B) + pe(A) —pa(B) Pe(A) 


Since | p4(A) — pz(A)| < 1, the second term on the right of Eq. (7) goes 
to zero as n ©, so the first term represents the asymptotic form of 7”. 

With Eq. (7) we can calculate T"d, , and so obtain the probability of 
A on successive trials: 


(n) , pp( A) 
p’(A) = 5B) + p.lA) 


1 
~ pa(B) + ps(A) 





5 bas 








= » p(Aypa(B) = p'°’(B)p,(A) 
+ [n(4) — 2 A)] OB) pat A) (8) 





The value of 
pe(A) 
p(B) + ps(A) 





asn—o, 


p(A) > 


It is apparent that Eq. (8) can be written 
p'” (A) = a(l — be), (9) 














GEORGE A. MILLER 153 


where 


— pe(A) 
pa(B) + ps(A) ’ 





b = —p(A) a + p(B), 


— In [p4(A) — pa(A)]. 


c 


Eq. (9) is an exponential growth function—a form frequently used to de- 
scribe data from learning experiments. It should be noted, however, that 
while the average subject may follow such a learning function, the individual 
subjects are generating stationary time series that do not represent learning. 
The term “learning” probably should be reserved for those cases in which 
the matrix operator changes on successive trials. 

We shall illustrate the use of the Markov chain with a numerical ex- 
ample. Suppose that two alternative responses are called right (R) and 
wrong (W), that p(R) and p\(W) are measured by the percentage of 
subjects in a large sample that choose R and W on trial n, and that the 
transitional probabilities observed on successive pairs of trials are constant. 
Assume the following numerical values for T d, = d, : 


\ * “ a 
03.73) (1 43 
A right response is followed by another right response 97 per cent of the 
time; wrong follows wrong 73 per cent of the time. From Eq. (8) we calculate 


that the successive values of p“ (R) are 0, .27, .46, .59, .68, etc., approaching 
the asymptote of .90..The equation is 


p(R) = 9(1 — .7") (n = 0,1,2,...) 


If we know that on a particular trial a W occurred, this equation gives the 
probability of R on the nth succeeding trial. 


2. Autocorrelation Function. A simple parameter of such Markov chains 
is the autocorrelation function. We will mention it now because for the more 
complex cases we wish to consider next the autocorrelation function is either 
not defined or is most tedious to compute from the matrix of transitional 
probabilities. 

The autocorrelation function is the correlation of a time series with itself 
displaced 0, 1, 2, ... steps. With zero displacement the correlation of the 
series with itself is, of course, +1. With a displacement of one step, the 
responses on trials 1, 2, 3, ... are correlated with the responses on trials 








154 PSYCHOMETRIKA 


2, 3, 4, .-. . If the series of binary choices is fairly long, the autocorrelation 
after a displacement of one step is given by 
r; = pa(A) — ps(A). (10) 


We note that 7, is a characteristic root of the matrix of transitional prob- 
abilities. More generally, 


'm = pa" (A) — ps” (A), (11) 
where p§”(A) and p”(A) are elements of 7”. From Eq. (7) we observe 
that these elements of 7” are 


p( Ay pe(A) + pa(B) [pa(A) — pp(A)]” 
Na pa(B) + ps(A) 





and 
Pe(A) — pa(A)[ps(A) — pa(A)]” : 
pa(B) + pe(A) 


When these values are substituted in Eq. (11), we obtain 
Tm = [pa(A) — ps(A)]” = rr. (12) 


In short, for a simple Markov chain, the autocorrelation between positions 
n and n + m is the mth power of the autocorrelation between n and n + 1. 
If | 7, | < 1, then | r,, | declines monotonically toward zero. 

A simple example is provided by the Samoan language. E. B. Newman 
has noted that the sequence of consonants (C) and vowels (V) in Samoan 
writing is adequately described as a Markov chain with the following matrix 
of transitional probabilities: 


i on " i, “ 
p(V) py(V) l 61 


Consonants never follow consonants in written Samoan. The autocorrelation 
function is easily computed from this matrix. For successive displacements 
of one letter the value of the correlation coefficient is 1, —.49, .24, —.12, 
.06, —.03, ete. 

The autocorrelation function for this simple process can also be de- 
scribed as the determinant of 7”. Thus 7, is the determinant of T° = J, r, 
is the determinant of 7, 7. is the determinant of T”, etc. 

When the distribution of probabilities at n + 1 depends upon events 
prior to n as well as upon n itself, Eq. (10) still holds as a definition of the 
autocorrelation function, but Eq. (11) does not hold. When more than two 
unscaled alternatives are used, the autocorrelation function is not defined. 





pe’ (A) = 


3. Extension to More than Two Alternatives. The extension of the matrix 
equations to experiments involving more than two alternative responses is 
straightforward. Designate the alternatives A, B, C, ... , N. Then we have 














GEORGE A. MILLER 155 




















(p4(A) pe(A) +++ py(A)) (p™(A) p**?(A) 
pa(B) p(B) +++ py(B)| |p (B) p"*? (B) 
< E . o >< , > = 4 : Fe (13) 
pa(N) pa(N) ++: py(N)} (p™(N) p"*?(N) 


General solutions are known for certain types of operators. These are of 
considerable interest in physics and genetics, where the elements of 7 are 
given by theory. The present use of such operators is almost purely de- 
scriptive, however, for we do not know what special types of matrices will 
be of the greatest psychological interest. 

It is not always necessary to find a general solution. A qualitative un- 
derstanding of an experimental situation is often provided by simply trans- 
forming the initial distribution five or ten steps by direct matrix multi- 
plication. For example, a learning situation might be analyzed into three 
kinds of responses: correct (C), slightly wrong (S), and grossly wrong (G). 
During the course of learning a subject begins by making gross mistakes, 
then slight mistakes, and finally manages to make correct responses. Such a 
situation could produce a matrix equation like the following: 


fro ps(C) pa(C)) gp) ( 9 3 °) (°) 
PcA\G) ps(G) palG) ~@) 0 1 77M 

It is tedious to find the general solution of T”, and it is easy to see by direct 
multiplication what happens. The proportion of grossly wrong responses 
declines steadily: 1, .7, .52, .40, .382, .26, ... , .08. The proportion of small 
errors on successive trials at first increases, then decreases: 0, .3, .39, .40, 
38, .35, ... , .28. The proportion of correct responses gives a roughly S- 
shaped function: 0, 0, .09, .20, .30, .38, .45, ... , .69. This situation is analogous 
to pouring water from one vessel into a second, which in turn pours the water 
into a third. The asymptotic distribution can always be found by solving 
the equation Td, = d, . 

The form of a general solution can be indicated, for finite matrices with 


distinct roots, as follows. Let A; represent the N characteristic roots of the 
polynomial det (7 — AZ). We define a set of matrices f,;(T) by 


f(7') 
em (T rs AD(T ee of) weds (T ae. A, )(T aes Asif) ee (T pam Avl) é 





(14) 


(A; po AAs ar) 2) sain (A; a Ai-1)(A; ae Ai+1) rade (A; ‘aie Ay) 








156 PSYCHOMETRIKA 


In terms of these matrices, T can be expressed 


T = dfi(T) + f(T) + +++ + dAvfn(T). (15) 
If g(A) is a rational scalar polynomial, then 
gL) = gA)fi(T) + go) f(T) + +++ + gQwn)fr(7). (16) 
In particular, if g(A) = \”, we have : 
T* = Mifi(T) + Af(T) + +++ + Avfx(T). (17) 


The 2 X 2 transformation is expressed in this form in Eq. (7). Concerning 
. the roots A; , we know that \, can be assigned the value 1, and that all the 
other roots fall between —1 and +1. Thus the asymptotic value of 7” is 
given by f,(T). 

The solution for a particular matrix can always be obtained by (a) finding 
the roots of the characteristic polynomial, det(7’ — AZ); (b) determining the 
f(T) according to Eq. (14); (c) substituting into Eq. (17); and (d) solving 
T"d, for the given boundary conditions of d, . This procedure has the ad- 
vantage of avoiding the problem of inverting a large matrix, but if two or 
more roots are nearly the same, the computations may be quite difficult. 

The autocorrelation function is not defined for more than two unordered 
alternatives, because the value of the correlation coefficient varies according 
to the various possible assignments of numerical values to the different 
alternatives. However, the determinant of the matrix of transitional prob- 
abilities has many of the characteristics of a correlation coefficient, and in 
the 2 X 2 case the determinant and the autocorrelation coefficient are 
identical. The determinant of 7”, as a function of n, lies between +1 and —1, 
declines toward 0 for the Markov processes, and can reveal periodicities in 
much the same way as an autocorrelation function. The possible usefulness 
of this extension to N X N transformations needs to be explored. 


4, Extension to Compound Responses. For psychological purposes it is 
an inconvenience that Markov processes have no memory. We must now 
remove the restriction that, if the outcome of the trial n is known, events 
prior to 7 are irrelevant for predicting the outcome at n + 1. We must con- 
sider the non-Markovian case. What we must do is to expand the definition 
of a state of the system in order to make such systems Markovian in a 
larger space. 

If the probabilities at trial n + 1 depend upon the outcomes of trials n 
and n — 1, but knowledge of events prior to n — 1 does not change our pre- 
diction for n + 1, we have a non-Markovian system. This system is made 
to be Markovian by changing the definition of an event. Instead of char- 
acterizing the state of the system by the occurrence of a single response, 
we characterize it by pairs of responses. If there are two atomic alternatives, 

















GEORGE A. MILLER 157 


A and B, in the original system, then there are four compound alternatives, 
AA, AB, BA, and BB, in the new system. Thus we must define a distribution 
d, over four alternatives, and JT is a square matrix of fourth order: 





paa(A A) 0 Pra(AA) 0 p (AA) 
va, —{P4(AB) = 0 Dal AB) 0 p” (AB) 
0 Paw(BA) 0 Pes(BA) p™ (BA)| 
0 — pan(BB) 0 psx (BB)) \p“(BB) } 
p"*Y(AA) 
aa = dy. . (18) 
p""*” (BA) 
Lp (BB) | 


Note that many of the transitional probabilities are zero; it is not possible 
for the system to move from some state to others in a single step. For ex- 
ample, the system cannot move from AA to BB in less than two steps: 
AA — AB — BB as in the sequence AABB. 

Tabulations of sequences of vowels and consonants in written Hebrew 
have been made by E. B. Newman. The sequence of consonants (A) and 
vowels (B) can be adequately represented by a matrix of the form of Eq. (18): 


0 O .23 0) {.095 
1 0 7 Ol} 4101. 
0 81 90] | .410 
0 .19 10} 085 


As before, the transformation T' can be applied iteratively to carry any 
initial distribution into a final, unique, stable distribution. 

This extension of the Markov process can be carried as far as the data 
seem to merit. For example, fixed-ratio reinforcement in operant conditioning 
requires an animal to respond m times in one way, then approach the food 
tray. In order to keep track of the sequential aspects of this behavior we 
could define a state of the system to include all the possible sequences of 
responses and approaches of length m + 1. Thus there would be 2”*? alter- 
native states, and the transformation would be of order 2”**. More complex 
sequential dependencies arise in human verbal behavior and can be treated 
in a similar manner. The verbal case is so complex, however, that it cannot 
be adequately discussed in this paper. 

In principle it is possible to extend the Markov definition indefinitely 
to take into account as much of the past history of the system as one desires. 











158 PSYCHOMETRIKA 


Cases are known, however, in which the extension would need to be carried 
infinitely far into the past in order for the Markov model to summarize all 
the information. Such cases are better handled in other ways. At present, it 
seems likely that most learning situations will need to be described by these 
other methods, and that Markov processes using a single matrix of transitional 
probabilities are most valuable when the behavior has settled into a relatively 
stable pattern. 


5. Least-Squares Fit to Data. Under the assumption that a single trans- 
formation describes the behavior, every trial can be considered a measurement 
of the single transformation 7. We wish to find a least-squares solution that 
will give the best estimate for 7 from the available data. The following pro- 
cedures may not be the most efficient for Markov processes, but they represent 
one fairly natural extension of the procedures used with more familiar statis- 
tical problems. 

We introduce a matrix M to represent the observed data. This matrix 
is formed by placing in successive columns the distributions observed on 
successive trials, from trial 1 through trial n — 1. If each distribution con- 
tains a alternative quantities, and n such distributions are known for suc- 
cessive trials, then M is an a X (n — 1) matrix. A matrix N is formed 
analogously by placing in successive columns the distributions observed on 
the successive trials from 2 through n. Thus N is also ana X (n — 1) matrix. 
The matrix N represents the best estimate of the successive distributions: 


N=N+0, (19) 


where the elements of the matrix C are the corrections that must be added 
to the observed values in N to give the best estimate N. 

We wish to determine 7’, the best estimate of the transformation. From 
the definition of M and N and the assumption of a single operator throughout 
learning, we have the equation: 


™M=N=N+C. (20) 

From Eq. (20) we obtain an expression for C: 
C=-N+TM. (21) 
For a least-squares solution, CC’ must be a minimum. This is obtained by 


putting the partial derivative with respect to T' to zero: 


0 
—= UC’ = MC’ = 0. 22 
oT =) 


We now substitute for C’ from Eq. (21) into Eq. (22) and obtain 


M(—N + TM)’ = —MN’ + MM’T’ = 0. 














GEORGE A. MILLER 159 


Rearranging terms gives 


T’ = (MM’)"'MN’, 
or “ 

T = NM’(MM’)"'. (23) 
Kq. 23 provides a best estimate of 7 on the basis of the data matrices M 
and N. 

As an example, consider an experiment in a T-maze. We decide from 

an examination of the data that the learning process can be described by a 
Markov process with a single transformation. Suppose that 10 rats were run 
for 20 trials, and that on successive trials the following numbers of rats 
made the correct choice: 5, 7, 6, 6, 8, 8, 8, 7, 8, 9, 8, 7, 8, 9, 10, 10, 8, 8, 9, 9. 
From these data we construct the matrices: 


‘ 7 Aoeaeasestseaeasaa s ds 
SR@aAAABSZAAAAD BAIA 
10 10 8 8 *t 


o 2 42 2 a 


72 2 
- mo 0 
10 8 8 9 & 
2 oe a ‘t. 
Next we multiply these matrices to obtain 
ji i. ge Milt ie ut 
2.74 .96 2.91 1.19 
The matrix MM’ is easily inverted, and we have 


7 = Mummy" i m 1.19 = 1 
2.74 .96)/—-2.91 11.99) 598’ 


Pa | 92 0 
T = 
08 .61 


The initial distribution dp is (.5, .5), and from Eq. (8) we obtain 
py (R) = 83 — .33(.63)". 


The values calculated from this equation are .500, .665, .738, .785, .804,..., 
approaching .83 as the asymptote. Note that we do not have a least-squares 








160 PSYCHOMETRIKA 


fit of this function, p’(R), to the observed data; we have a least-squares 
fit for the transformation 7. 
From Eq. (21) we can calculate the corrections that are added to N: 


655 .761 .708 .708 .814 .814 .814 .761 .814 .867 
345 .239 .292 .292 .186 .186 .186 .239 .186 .133 
814 .761 .814 .867 .920 .920 .814 .814 mi 
186 .239 .186 .133 .080 .080 .186 .186 .133 


TM = | 


iar Hee 161 108 —.092 014 014 114 —.039 
045 —.161 —.108  .092 —.014 —.014 —.114 — .039 
—.086 067 .114 —.0389 —.086 —.133 —.080 
086 —.067 —.114 .039 086 .133  .080 
120 014 —.086 —.033 
—.120 —.014  .086 oa 


The squared deviations are given by 
sie 144 a: 
— .144 .144 
The best estimate of the dispersion of the calculated from the observed 
values is 


i ow, 


“a [144 

| cc l.1 44 

_ ee = amma ieee C 

o V n a 1 V 17 .092. (24) 


The variance-covariance matrix V is given by 


, , 00847 » =i 
V = o(MM')' = 9 ! a "t. (25) 
- —2.91 11.99 


From Eq. (25) we compute the standard deviations of the estimates of 
pa(A) and p,(B): 


/1.19 


alpa(A)| = 092 oe 4 , 
| 5. 
11.99 
o[p,(B)| = 092 re = 132. 
5. 


The same procedure can be applied to the data from a single animal. 


The data matrices M and N then have either 0 or 1 on successive trials; e.g., 























GEORGE A. MILLER 161 


ee ee 
01000110100 --. 100 
jo Oa ed eee ire 

10001101001: 000 


In order to solve for 7’ we determine 
NM’ = ‘igre pia uM = a 0 ‘ 
m(1,0) m(0,0) 0 m(0) 


The symbol m(i,j) represents the number of occurrences of the ordered pair 
7,j; m(t) represents the number of occurrences of 7; and m(0) + m(1) = 
n — 1, where n is the number of trials. Next we invert MM’ and solve for T: 


m(1,1) m(0,1) 1 
; m(i) 
T = NM’(MM’)" = ” 
Sons a Jo 
m(1,1) m0, 1) 
7p — ( m1) m(0) ] 
nao m(0,0) j 
m(1) m(0) 
Eq. (26) is the result that would be expected from the definition of the 


transitional probabilities. 
In order to estimate the dispersion we calculate 


(26) 


m(1,1) m@,1) mU,t)  — m(l,1) 
TM = j m(1) m(0) m(1) m(1) ] 
m(1,0) m(0,0) m(1,0) — m(1,0) ( 

m(1) m(0) m(1) m(1) 


Then from Eq. (21) we find 


( m(1,1) —m(0,0) —m(1,0) = —=—m( | 
Ca Ft =~ % « m(1) m(Q) m(1) m(1) 
- m(1,1) = m(0,0) ss m(1,0) ml 0) 
m(1) m(0) m(1) m(1) 


The squared deviations are given by 








162 PSYCHOMETRIKA 


where 


m(1,0) |? m(1,1) |? m(0,0) |? 
C= ma, (L9)| - mo) MD | os m(0,0)| “m(0) ] 
mod 


+ m(o,0)| m(0) 


= {m(a,0) + m,n] MEMO) | 4 fm(o,0) + mo,n)]| MODMOD | 


- m(1,1) | m(1,0) m(0,1) #10.) | 
7 ma] m(1) m(1) | m mo] m0) m0) J 
The dispersion is, therefore, 

= “e ae {[ m1) = m(1,1) | mil. | 
" a—-4—q iba 8 m(1) m(1) 


m(0)  m(0,1) m(0,0) |\'” " 
+ [m0 m(0) ° m(0) } ae 











The variance-covariance matrix is 
1 
m(1) 


Cc 


=e 





V = 0(MM’)? = 
n 1 


m(0) 


and from this matrix we compute 


olp,(A)] = eer and olp,(B)] = eae (28) 


Although these examples are worked out for the Markov case with two 
alternatives, the same procedures can be used with more than two alternatives 
or with Markov processes defined for compound responses. It should be 
stressed, however, that the statistical properties of Markov chains are neither 
simple nor well understood. Better techniques will undoubtedly develop as 
the Markov process becomes more widely applied. 


6. Variable Transformations. Up to this point we have made the ex- 
plicit assumption that a single transformation could describe the successive 
changes in the probabilities of the alternative responses or alternative se- 
quences of responses. This assumption greatly simplifies the theoretical 
landscape and should be made whenever the data hint that it might be true. 
Simplicity is not, however, an intrinsic property of the behavior of living 
organisms, and so we must be prepared to deal with situations that obviously 
violate the assumption. 














| 
| 
| 
| 














GEORGE A. MILLER 163 


The assumption that a single transformation is adequate means that the 
transitional probabilities are fixed from the first through the last trial. Since 
the transitional probabilities determine the sequences of responses that are 
probable or improbable, we are assuming that the animal’s course of action 
ar strategy is fixed throughout the experiment. In a certain sense, therefore, 
such an assumption means that there is no learning at all; as soon as the 
experimental situation is encountered for the first time, the subject adopts 
the set of transitional,!probabilities that will later describe the statistical 
properties of his behavior after he has had long experience in the situation. 

The assumption of a single transformation would be justified, for ex- 
ample, after a long series of alternate conditioning and extinction. In this 
experiment the subject is able to evolve a single transformation for the re- 
inforcement conditions and another for the extinction conditions. Or if an 
animal has adopted a stable mode of behavior in a situation and then is 
temporarily distracted in some way, his return to normal when the im- 
pediment is removed might be expected to follow a single transformation. 
But in most of the situations that are studied experimentally there is no a 
priort reason to expect that a single transformation will be adequate, and 
there are several reasons to expect that it will not be. 

In order to illustrate what is involved in the assumption of a single 
transformation, Table I has been prepared to show one case where the 
assumption is correct and another where the assumption is wrong. Once more 
we consider the data from 10 rats on 20 consecutive choices in a 7-maze. 
The symbol 1 represents a correct choice, and 0 represents an incorrect choice. 
In Tables IA and IB the numbers of rats making the correct choice are the 
same, and both are the same as the example fitted in the preceding section. 


TABLE 1 
Hypothetical Data for Ten Rats on Twenty Trials in a T-Maze 


IA. Constant Transformation 








Trial 
Rat 123 38 4 6 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
, i 20 8 6 00 0 fT I tt) f. 2 (ie aoe Se a | 
2 oe. od @ 2s 2s rkiegt?® F °O..F 8 
3 t-2 2. 2 Ss -y 2 =i ce Sa a lee I i oor 3@ 
4 eS: 2 2s a es ee Ue 0606 0 1 i Ltr 4 
5 ,ita?g, tt ie2gd oe a se 1 oe fr 2 
6 ae SS HE a | i ae ee: eee Boa ae tak te 
7 00 6 0 I tet 2 2 rey) fg oP i ee 
8 af 4 2.4 0 0 0 0 0 eo t tt Pohl 
9 006 0 Tt a2 2-2 4 I ¢ O f°! : 2. f Fg 
10 ae aoe ee ee iiv 0 i Ye 2t 2 Lt t+ st 
z= > 76 6 8 ss 7 Ss 9 8 7 8 910 100 8 8 9 9 














164 PSYCHOMETRIKA 
TABLE 1 (Continued) 
IB. Variable Transformation 
Trial 
Rat i223 4% 6 7: 8°9 30 11 12 13 14 15 16 17 18 19 20 
11 LO a1 Lia O44 “a ee ee oa. Be a 
12 00001 i ie ie Le | Oa aad 1 oo oa 
13 TD ies eS eee oO 1 a a 2322) 8 4 i713 8 
14 Ma 8: 4a 4 iat 2 4 Poo 42 3 PoP Aa Se 
15 tia @ a OD: 4-4 oe aes i 20: 4 4 
16 020 1 1 1 2ao9o i ee ee ee | PA 28.4 
17 no 214.8 oe as a a i ee Gee Ole : a: Tie i age 
18 0911.80 0 L'a @: a a a. ee i i | 42 32°8 2 
19 100 0 1 O20 2 1 ie le i 2. a 2 
20 jie ee ee Kae et We ee a Oo? 2 2 4 i 4a a 2 
z 5 7 66 8 $87s8s 9 8 7 8 910 10 8 8 9 9 


From the data in Table I we can estimate the values of p,(1) and p0(0) 


on successive pairs of trials by [m(z,7)]/m(c): 


IA Trial 


PTEP PS 
| i 
ON Crk © tO 


~ 
| 


8-9 

9-10 
10-11 
11-12 
12-13 
13-14 
14-15 
15-16 
16-17 
17-18 
18-19 
19-20 


There seems to be a clear trend in IB for p,(1) to increase on successive 
trials, whereas no trend for p,(1) is observable in IA. If we group the trials 


~ 


pill 


SSesesses 


.88 
00 
00 

.00 
00 

.80 

.88 


.00 


1 
0 
1 
1 
0 
1 
0 
1 
1 
0.89 
0 
1 
1 
1 
1 
0 
0 
] 
1.00 


po(0) 


0.60 
1.00 
1.00 
).25 
0.50 
1.00 
1.00 
0.67 
0.50 
1 

1 

0 


= 


.00 
.00 
.67 
0.50 
0.00 


0.50 
0.50 
1.00 


IB Trial 


Be Ay 
He CO bo 


! 


| 
ON 


SOI Cr OO te 
-~ | 


© 
_ 
—) 


10-11 
11-12 
12-13 
13-14 
14-15 
15-16 
16-17 
17-18 
18-19 
19-20 


by fives to secure more reliable estimates, we get 


IA Trials 


1-6 

6-11 
11-16 
16-20 


pi(1) 


0.94 
0.95 
0.98 
0.92 


po(O) 


0.67 
0.89 
0.63 
0.60 


IB Trials 
1-6 
6-11 

11-16 
16-20 


pi(1) 


60 
72 
60 
83 
75 
88 
75 
72 
88 
78 
75 
86 


al — 


pi(1) 
0.72 
0.85 
0.93 
0.92 


csosossososo 
SSssssssss 


ee al 
See: 


po(0) 


0.33 
0.30 
0.38 
0.60 























GEORGE A. MILLER 165 


Comparisons such as these show that the assumption of a constant 
transformation cannot be checked by the successive distributions alone, for 
IA and IB are identical in this respect. The assumption is justified if the 
analysis of short sequences of trials shows relatively constant transitional 
frequencies, as in IA. If the transitional frequencies show a definite trend, 
as in IB, the assumption is not justified. 

The question is what to do when we face variable transformations. 
Whatever we do, the situation will not be simple. If ... PQRST d, cannot 
be translated into ... TTTTT d, , the matrix products may get quite com- 
plex. If we could choose P, Q, R, S, T as commutative matrices, it would be 
possible to find a simultaneous solution for all of them; all matrices would 
have the same characteristic vectors but different characteristic roots. Un- 
fortunately, however, it does not seem possible in general to choose _com- 
mutative matrices with the properties demanded by the data. 

If the complexity of the problem is admitted as inevitable, we can still 
look for a matrix function of n, T(n), that changes in some reasonable way 
on successive trials. The following argument illustrates one possible approach. 
We assume that at the beginning of the experiment the subjects are equipped 
with transitional preferences given by the matrix U. After long experience 
in the situation the subjects develop transitional preferences given by the 
matrix V. As the experiment progresses the tendencies represented by U are 
slowly extinguished and those represented by V are slowly strengthened. 
Consider the following sequence of equations: 


TO) =U 

TQ) = w7(0) + (1 — w)V 

7(2) = w7(1) + (1 — w)V (29) 
Tin) = woTn-1 +0 —- w)V, 


where 0 < w < 1. The rationale for this set of equations is that w represents 
the perseveration of the tendencies on the preceding trial, and (1 — w) 
represents the ability to adopt the new mode of response symbolized by V. 
If the extinction of the old pattern of responses is slow, w is near unity; if 
the old pattern extinguishes rapidly, w is near zero. 

Kq. (29) can be written in terms of U and V: 


TO) = U =w(U-V)+ V 
T(1)=wU+(1-wV =w(U — V)+4+V. 
T(2) =wU+(1—w)V = w(U — V)+ V. (30) 


Oe Je eee ee ee ee ee ee a ee oe ee oe ee ee ek 2 a a ar: 


Tin) = wUt+(1— w)V = w(U — V) + Vz 








166 PSYCHOMETRIKA 


In this form it is clear that, since 0 < w < 1, T(n) approaches V as n in- 
creases. The importance of U becomes progressively smaller as the subject 
has more and more experience in the experimental situation. This formulation 
has the advantage that it is relatively easy to compute the successive values 
of T(n), given U and V. The initial and final matrices, U and V, can be given 
theoretically or can be determined from data obtained prior to the first trial 
and after the learned behavior has stabilized again in the new course of action. 
For illustrative purposes, assume that U and V are known to be 


5 5 9 A 
5 5 1 6 


and that the weight w is calculated to be 0.8. Then Eq. (30) gives 


ps t \" t 
T(n) = .8" + ‘ 
4 —.) a OG 


Then on successive learning trials we have: 
“ei @ & 4 6&6 6 ¥ 8 6fHhUD 
pa(A): .5 .58 .644 .695 .736 .768 .796 .816 .832 .846 .857 --- 
p(B): .5 52 .536 .549 .559 .567 .574 .579 .583 .587 .589 --- 


Next we calculate the proportions of right and wrong responses on successive 
trials. This is given by the equation: 


T(0)d, = d, 
T(1)d,; = d, = T(1I)TO)d 
T(2)d. =d; = T(2)T(1)T(O)d (31) 
T(n)d, = dni = [] Tid. 
It is assumed that 7(0) = U and d, are known from preliminary experi- 


mentation. Assume the boundary condition dj = (.5, .5). Then direct com- 
putation gives the values: 


a: i 3 3 4 5 6 7 8 9 10 -:+ o@ 
P(R):.5 .53 .559 .587 .614 .639 .662 .683 .700 .716 --- .800 


Considerable care must be taken with such iterated computation, for the 


errors are cumulative. 
It should be noted that if w = 0, the variable case reduces to the constant 


























GEORGE A. MILLER 167 


case, for then T'(n) = V and IIT(z) = T”. Similarly, if w = 1, then T(n) = U 
and we again have a single transformation. 

A special case arises if U and V commute, UV = VU, for then T(n) 
and T(n + k) also commute. If two matrices with distinct roots commute, 
then one can be written as a polynomial in terms of the other, with scalar 
coefficients. Thus if the matrices A and B commute, we can write, according 
to Eq. (15) and (16), 

B Ai fi(B) + A2f2(B) side a Avi v(B) (32) 
A = g(B) gr fi(B) + gQ2)fe(B) + +++ + gw) fx(B), 


where \, is the characteristic root of B; g(\,) is the characteristic root of A; 
and for matrices of transitional probabilities 4, = g(\,) = 1. Thus A and 
B have different roots, but f;(A) = f;(B). Another way of saying the same 
thing is to note that commutative matrices are transformed into their diagonal 
form by the same operator. Thus if S transforms A into the diagonal form 
A, , S also transforms B into its diagonal form Az. The product of A and 
B is (since the diagonal matrices A, and Ag obviously commute) 


AB = (SA,S~')(SAzS') = SA, ApS = SAzA,S°* 
= (SA;S"')(SA,S') = BA. 


If the matrices 7'(7) commute, then 


0 


I T(i) = s 11 40) |s>, (33) 


where the A(z) are the diagonal matrices similar to 7'(z). The product of the 
T(t) reduces to the product of diagonal matrices. If all of the A(z)’s are 
equal, then Eq. (33) reduces to the constant case given by Eq. (5). 

Commutative matrices occur when the distribution over the several 
alternative responses does not change, although the transitional probabilities 
do change. If U has been applied repeatedly, U" approaches f,(U) as a limit; 
after V has been applied repeatedly, V” approaches f;(V). When U and V 
commute, f;(U) = fi(V), and so both transformations lead to the same 
stable distribution. Such a situation might arise in learning a simple alterna- 
tion between left and right. The learning might leave p(L) = p(R) = 45, 
although the transitional probabilities were altered. 

This discussion of learning should suggest some of the descriptive possi- 
bilities of systems of dependent probabilities. By this general development 
we arrived at a mathematical description of complex behavioral changes— 
a description that enables us to talk about the gradual replacement of one 
pattern of responses by another. 


Manuscript received 2/8/51 
Revised manuscript received 9/17/51 





















PSYCHOMETRIKA—VOL. 17, No. 2 
JUNE, 1952 


AN INTERNAL CONSISTENCY CHECK FOR SCALE VALUES 
DETERMINED BY THE METHOD OF SUCCESSIVE INTERVALS* 


ALLEN L. Epwarps 
UNIVERSITY OF WASHINGTON 
AND 


L. L. THurRsTONE 
UNIVERSITY OF CHICAGO 


The method of successive intervals is a psychological scaling procedure 
in which stimuli are classified into successive intervals according to the degree 
of some defined attribute which they are judged to possess. A psychological 
continuum is defined and the scale values are then taken as the medians of the 
distributions of judgments on the psychological continuum. It is assumed 
that the distributions of judgments for each stimulus are normal on the 
psychological continuum as defined. 

An internal consistency check indicates that the cumulative distributions 
of empirical judgments for the various stimuli can be reproduced by means of 
a limited number of parameters with an average error that compares favorably 
with that usually reported for paired comparison data. Furthermore, the 
scale values obtained by successive interval scaling, for the data reported, are 
shown to be linearly related to those obtained by the method of paired com- 
parisons. 


I 


The problem involved in psychological scaling is the ordering of a group 
of stimuli, objects, or situations on a psychological continuum when the relative 
positions of the same stimuli on a corresponding physical continuum are 
unknown. Methods proposed for obtaining the psychological scale values of 
stimuli have, for the most part, been based upon Thurstone’s (6, 7) well- 
known law of comparative judgment. 

The law of comparative judgment assumes that for each stimulus R; on 
the physical continuum there is associated a most frequently aroused or 
modal discriminal process S; on the psychological continuum. Since dis- 
criminal processes other than S; may also be evoked by R;, , it is necessary 
to make some assumption concerning the distribution of these processes. 
Thurstone makes the plausible assumption that the distribution of all dis- 
criminal processes aroused by #, is normal about the modal discriminal 
process S; 


*This paper was written while the first author was a post-doctoral Research Training 
Fellow of the Social Science Research Council studying at the University of Chicago. It 
reports research undertaken in cooperation with the Quartermaster Food and Container 
Institute for the Armed Forces. The views or conclusions contained in this report are those 
of the authors. They are not to be construed as necessarily reflecting the views or endorse- 
ment of the Department of Defense. 


169 








170 PSYCHOMETRIKA 


For any given stimulus R; , two parameters on the psychological con- 
tinuum are thus involved: S, , the modal discriminal process which is taken 
as the psychological scale value of R; , and s; , the standard deviation or 
dispersion of the discriminal processes aroused by R; . If a second stimulus 
R, is introduced, it is again assumed that R, is associated with a modal 
discriminal process S, and that the distribution of discriminal processes 
aroused by R, is normal about the modal value. From statistical considera- 
tions, the psychological scale separation of S; and S, may be expressed as 
a normal deviate X,, . It then follows that the scale separation of S; and 
S, on the subjective continuum is 


S; — S, = XuV 8; + sp — rps; , (1) 





which is Thurstone’s law of comparative judgment, Case I. 


II 


In the method of paired comparisons, judgments of the nature R; > R, 
are obtained. Taking a sufficient number of stimuli in all possible pairs and 
obtaining comparative judgments from a large number of subjects, the pro- 
portions p; > p. = p;, can be determined. By means of a table of the normal 
probability curve, the X,, values corresponding to the p,;, can be obtained. 
If the psychological scale value of one of the n stimuli is then taken as an 
origin, the psychological scale values of the other n — 1 stimuli, in terms of 
the arbitrary origin, can be determined from Equation (1). 

The method of equal-appearing intervals involves the rating or sorting of 
stimuli into one or another of various categories. The categories are so ar- 
ranged that they represent increasing degrees of the attribute possessed by 
the stimuli and being scaled. The subject’s task is to classify the stimuli 
so that they are arranged in groups, each group being separated from the 
next by an apparent “equal” interval. The method has the virtue of re- 
quiring that the subject make only one judgment for each stimulus, whereas 
the method of paired comparisons requires that the subject make (n)(n — 1)/2 
judgments. 

Empirical studies (4) demonstrate quite conclusively that the intervals 
on the subjective continuum are not equal, as the method of equal-appearing 
intervals assumes. The central intervals, in general, tend to be smaller than 
those toward the extremes. This finding is associated with the skewed dis- 
tributions of judgments or ratings usually obtained for stimuli whose scale 
values fall toward one or the other extreme of the continuum. As a result of 
this end-effect, if the same stimuli are scaled by paired comparisons and 
equal-appearing intervals, the plot of the two sets‘of scale values shows a 
decided departure from linearity toward the extremes. 

Various procedures have been suggested for taking into account the in- 
equalities in the widths of the intervals. Saffir (5), for example, describes a 














ALLEN L. EDWARDS AND L. L. THURSTONE 171 


previously unpublished technique of Thurstone’s called the method of suc- 
cessive intervals. Guilford (3) calls his procedure the method of absolute scaling, 
while Attneave (1), using a similar technique, calls it the method of graded 
dichotomies. The term method of successive intervals seems to describe all of 
these procedures and it will be used in that way here. 

It is one of the virtues of the method of paired comparisons that it 
provides an internal consistency check. By working backward from the 
psychological scale values, one can arrive at a matrix of theoretical propor- 
tions p/, . The discrepancies between the theoretical proportions and the 
observed proportions provide a measure of the consistency of the obtained 
psychological scale values with the empirical data. We shall, in the remainder 
of this paper, describe an internal consistency check appropriate to successive 
interval data. 


III 


In scaling by the method of successive intervals, we assume that a given 
group of stimuli can be ordered upon an unknown physical continuum accord- 
ing to a defined attribute which the stimuli are assumed to possess in varying 
but unknown degree. We assume also that the stimuli can be rated or sorted 
into successive intervals by a group of subjects in terms of judgments of the 
degree of the attribute possessed by the stimuli. Upon the basis of the 
frequencies with which each stimulus is placed in each interval, a psychological 
continuum is defined. 

It is assumed that the projections of the discriminal processes upon the 
psychological continuum are normal for each stimulus. The psychological 
scale values of the stimuli will therefore be given by the median discriminal 
process as projected upon the psychological continuum. From a knowledge of 
the psychological continuum and the scale values of the stimuli, we further 
assume that it is possible to reproduce, within a specified margin of error, the 
empirical distributions of judgments. While we make the assumptions noted, 
it is to be emphasized that the plausibility of these assumptions may be tested 
as a consequence of applying the method. 

In setting up the successive intervals or categories, we may provide as 
many anchorage points as seems desirable. For example, we may choose 
to anchor the two extremes with some such descriptive phrases as ‘Like 
very much—Like very little” or ‘Highly favorable—Highly unfavorable.” 
We may also decide to anchor the middle category with a descriptive phrase 
such as ‘Average,’ “Neither like nor dislike,” or “Neutral.” If it seems 
desirable to anchor other categories, this may be done. Under any circum- 
stances, we assume that the intermediate categories represent varying degrees 
of the attribute as defined for the subjects. 

The successive intervals should be sufficient in number to offset the 
possibility that the scale value of any stimulus will fall in either extreme. 








172 PSYCHOMETRIKA 


As we shall show later, the psychological widths of the two extreme intervals 
are indeterminate. Thus if more than 50 per cent of the judgments for any 
given stimulus fall in either the extreme left or the extreme right category, 
its scale value cannot be determined by the procedures described. It would 
seem obvious that in such cases we need to increase the number of categories, 
for the modal discriminal process may fall either in the extreme category now 
in use or possibly beyond this. 

If the stimuli are now rated or sorted into the successive intervals by a 
large number of subjects, we shall have available a frequency distribution for 
each stimulus showing the number of times it is placed in each of the succes- 
sive intervals. These frequencies may be cumulated, say from left to right, 
and the cumulated frequencies may be expressed as cumulative proportions 
of the total number of judgments. 


TABLE 1 


Cumulative Distribution of Judgments for Food Preference Data of Thurstone 
(N = 253 or 254) 














DISLIKE LIKE 
Mod- Mod- 
Ex- er- er- Ex- 
Stimuli treme Strong ate Mild NEUTRAL Mild ate Strong treme 
1 2 3 + 5 6 f 8 9 
1. Vanilla ice .00 .00 .O01- .02 .05 416 66388). 1200 
cream 
2. Cantaloupe .00 .O1 .O1 .04 .06 21 .54 .91 1.00 
3. Chocolate cake .00 .00 .01  .03 .12 .25 .55 .86 1.00 
4. Blueberry pie 01 01 2 08 15 338 .61 .92 1.00 
5. Pineapple .O1 .02 .02 .05 .08 .26 .64 .92 1.00 
6. Applesauce 00 .00 .01 .04 .09 .40 .79 .98 1.00 
7. Rice pudding a2 604) CSC .30 .638 .87 .98 1.00 
8. Jello 00 .02 .04 .12 .22 .55 .88 .99 1.00 
9. Rhubarb 08 11 18 .30 37 .58 .80 .94 1.00 
10. Roquefort 07.15 .21 31 41 63 .80 .94 1.00 
cheese 





The cumulative distributions for each stimulus may be combined in a 
matrix of order n X r where n is the number of stimuli rated and r is the 
number of categories or intervals. The cell entries of this matrix (Table 1) 
will be the p;,’s showing the proportion of judgments for stimulus j which 
fall within or below a given category k. All entries in the extreme right cate- 
gory will of necessity be equal to 1.00. Consequently, only the entries in the 
matrix n X (r — 1) will be potentially free to vary. Thus for the 10 food 
stimuli of Table 1, each of which was rated or sorted into one of 9 categories, 
we have (10)(8) = 80 possible empirical proportions. 











ALLEN L. EDWARDS AND L. L. THURSTONE 173s 


If, from the matrix of cumulative proportions (Table 1), we take the 
values 1 — p,, we shall have a proportion showing the number of times that 
stimulus 7 was placed above the kth category. From the assumptions previ- 
ously made, if the value of 1 — p,, is .50, the scale value of stimulus 7 would 
be located at precisely the upper limit of the kth category (or the lower limit 
of the kth + 1 category). The corresponding normal deviate of this boundary 
would be 0.0. If the value of 1 — p,;, is less than .50, say .25, then we know 
that the upper limit of the kth category (or the lower limit of the kth + 1 
category) deviates positively from the scale value of stimulus 7. The normal 
deviate for this boundary, in terms of stimulus j, would be .67. Similarly, if 
the value of 1 — p;, is greater than .50, say .80, then we know that the upper 
limit of the kth category (or the lower limit of the kth + 1 category) deviates 
negatively from the scale value of stimulus 7. The corresponding normal 
deviate would be —.84.* 

A stimulus whose frequencies are distributed over all r categories will 
thus provide an estimate of the boundaries of the middle r — 2 categories. 
A stimulus whose frequencies are distributed only in categories 1, 2, 3, and 4, 
will provide estimates of the boundaries of categories 2 and 3. For reasons 
which are obvious, no estimate can be obtained of the lower limit of the first 
interval nor the upper limit of the last interval. The widths of these two 
intervals are indeterminate. 

In the manner described above we obtain the matrix of normal deviates 
(Table 2) corresponding to the interval boundaries. Considering but a single 
stimulus j, an estimate of the width of a given successive interval k on the 
psychological continuum will be given by the difference between the cell 


TABLE 2 


Normal Deviates seein hier eats to Boundaries of Successive Intervals for Food Preference 
Data of Table 1 








Successive Intervals 








Stimuli 

1 2 3 4 5 6 7 8 9 
] —-1.644 -— .99 — .38l .81 
2 —-1.55 — .81 .10 1.34 
3 -1.17 — .67 13 1.08 
4 —-1.41 -1.04 — .44 .28 1.41 
5 —1.64 -1.41 — .64 .36 1.41 
6 —1.34 — .25 .81 
7 —1.23 — .88 — .52 .33 1.13 
8 -1.17 — .77 .13 i es Og 
9 —-1.55 -1.23 — .92 — .52 — .33 .20 .84 1.55 
10 —1.48 -1.04 — .81 — .50 — .23 soo 84 1.55 





*Extreme values of 1 — pjx (greater than .95 or less than - may be ignored—as is 
usually done in the method of paired comparisons. ; 








174 PSYCHOMETRIKA 


entries X;, and X;,-,. Taking the differences between the successive entries 
of Table 2 for each stimulus, we arrive at the difference matrix, Table 3. 


TABLE 3 
Estimates of Interval Widths Obtained from the Boundaries of Table 2 








Successive Intervals 

















Stimuli 
2 3 4 5 6 7 8 
1 .65 .68 1.12 
2 .74 91 1.24 
3 .50 .80 .95 
4 of .60 st2 1.138 
5 23 77 1.00 1.05 
6 1.09 1.06 
7 .85 .36 .85 .80 
8 .40 .90 1.04 
9 .32 ol .40 .19 .55 .64 mk | 
10 .44 .23 ol Bf .56 51 ota 
Sum .76 .54 1.06 1.82 7.21 8.16 6.91 
x .38 Bre 385 .30 d2 .82 .99 
Cum. X .38 .65 1.00 1.30 2.02 2.84 3.83 





The differences in any single column are estimates of the width of a specified 
interval. We assume that the best estimate is the column average.* 

Finding the various averages, we arrive at the estimated interval widths 
for the categories 2 through 8, the widths of categories 1 and 9 being inde- 
terminate. We thus arrive at the psychological continuum with intervals 

Pe | 1 | j | | i] iT ie 


a ae ese se 1 1 aa Si 
i eS =e” 6 of 8 9 


SUCCESSIVE INTERVALS 





FIGURE 1 


Psychological Continuum for 10 Food Stimuli 


marked as shown in Figure 1. It is possible to take as an arbitrary origin 
the lower limit of the second interval .38. 

On the basis of the assumptions made, the scale values of the stimuli may 
be easily determined. They will be the medians of the distributions of judg- 
ments on the psychological continuum of Figure 1. 


*The calculations set forth here, although presented in somewhat different form, are 
the same as those followed by Attneave (1) and Guilford (3). 











ALLEN L. EDWARDS AND L. L. THURSTONE 175 


IV 


Having determined the psychological scale values and the nature of the 
psychological continuum, we are now ready to apply a test of internal con- 
sistency. The test to be described is similar to that used with the method of 
paired comparisons. In the latter method we use the n scale values to obtain 
a set of (n)(n — 1)/2 theoretical proportions. The discrepancies between the 
theoretical proportions and the observed proportions may then be determined. 
If they are small, we have reason to believe that our scale values are consistent 
with the empirical data. Furthermore, we can reproduce, within the observed 
margin of error, the (n)(n — 1)/2 independent empirical proportions by 
means of only n parameters, the scale values of the stimuli. 

In applying an internal consistency test for the method of successive 
intervals, we have available the n scale values and the r — 2 interval widths. 
From these parameters we determine a theoretical cumulative distribution for 
each of the stimuli. In the case at hand, we make use of 17 parameters in 
determining how well we can reproduce 80 cell entries. If the 10 stimuli were 
scaled by the method of paired comparisons, we should use fewer parameters, 
10, in the internal consistency test, but we should also have fewer independent 
proportions, 45. 


TABLE 4 


Theoretical Normal Deviates Obtained from the Seale Values and Interval Widths 








Cumulative Interval Widths 





Scale Values 








of Stimuli i .38 65 1.00 1.30 2.02 2.84 3.83 

1 2 3 4 5 6 fi 8 
(3.13) 1 —3.18 —2.75 —2.48 -—2.18 -1.88 -1.11 — .29 .70 
(2.75) 2 —2.76 -—2.37 —2.10 -—1.75 -1.45 — .78 .09 1.08 
(2.71) 3 —2.71 -—2.338 -—2.06 -1.71 -—1.4I1 — .69 AWS Ei 
(2.51) 4 —2.61 -—2.18 -—1.86 -—1.51 -1.21 — .49 33 =1.32 
(2.54) 5 —2.54 -—2.16 -1.89 -—1.54 -—1.24 — .52 30 §=61.29 
(2.23) 6 —2.238 —-1.8 —-1.58 -1.23 -—-— .98 -— .21 .61 1.60 
(1.75) 7 —1.78 —1.35 -1.08 -—- .7 — .48 .29 EAL 620 
(1.90) 8 —1.90 —1.52 -1.2% -— .00 — .@0 12 94 1.93 
(1.75) 9 —1.75 —1.387 -1.10 — .7 — .40 .27 1.09 2.08 
(1.60) 10 —1.60 -—-1.22 — .95 — .60 — .30 42 1.24 2.23 





The scale values of the 10 stimuli are shown at the left in Table 4. At the 
top of the table, we have entered the summation of the intervals on the 
psychological continuum. By subtracting the scale values of the stimuli from 
the cumulative interval widths, we obtain the X/, values entered in the cells 
of the table. These entries are the theoretical normal deviates corresponding 





176 


PSYCHOMETRIKA 


TABLE 5 





Theoretical Cumulative Distributions Obtained from Normal Deviates of Table 4 








Successive Intervals 








Stimuli — 

1 2 3 4 5 6 7 8 
1 .00 .00 01 .02 .03 13 .39 .76 
2 .00 .O1 .02 .04 .07 .23 54 .86 
3 .00 .O1 .02 .04 .08 .29 .55 .87 
4 .O1 .02 .03 .07 Ay .3l .63 91 
5 -O1 .02 .03 .06 Bt | .30 .62 .90 
6 01 .03 .06 44 .18 42 1 .95 
Y f .O4 .09 .14 .23 .33 .61 .87 .98 
8 .03 .06 11 18 27 .55 .83 .97 
9 .O4 .09 .14 .23 .o3 .61 . 86 .98 
10 05 1 ohy 27 .38 .66 .89 .99 





to the empirical values of Table 2. 


By reference to the table of the normal 


curve, we obtain the theoretical cumulative proportions p/, of Table 5. 

Making the matrix subtraction, p;, — p/, , we obtain the discrepancies 
between the theoretical and observed proportions. The average discrepancy 
is .025 and this represents the average error in reproducing the empirical 


distributions of Table 1 from the scale values and the interval widths. 


The 


average discrepancy obtained here compares favorably with the average error 
reported for internal consistency checks applied to paired comparison data. 


TABLE 6 


Psychological Scale Values of Food Stimuli Obtained by Various Scaling Methods with 


Means Taken as Arbitrary Origins 








Unadjusted for Inequalities in 
Discrimina] Dispersions 


Adjusted for Inequalities in 


Discriminal Dispersions 











Stimuli -— 

Graded Paired Graded Paired 

Successive Dichoto- Compari- Successive Dichoto- Compari- 
Intervals mies sons Intervals mies sons 
1 84 74 86 .83 .76 .84 
2 .46 .44 51 .44 .42 51 
3 .42 .37 .48 85 .389 .50 
4 22 15 41 -22 .16 51 
5 .26 .29 21 .25 .28 .16 
6 — .06 .02 — .07 .05 — .04 — .08 
7 — .56 — .49 — .57 .53 — .50 — .55 
8 — .38 — .34 — .58 .36 — .36 —.47 
9 — .54 — .54 — .59 51 —.51 — .60 
10 — .68 — .62 — .68 .66 — .60 — .84 




















ALLEN L. EDWARDS AND L. L. THURSTONE 177 


Hevner (4), for example, reports an average error of .024, Saffir (5), an average 
error of .031, and Guilford (2) a value of .027 for paired comparison data.* 

The internal consistency test described here can also be used with any 
of the variations in successive interval scaling. 

V 

Paired comparison judgments were obtained (8) for the 10 food stimuli 
for which we have here reported the successive interval data. Scale values 
were first obtained by Case V of the law of comparative judgment, assuming 
equal dispersions and correlations. Case III of the law of comparative 
judgment, which takes into account inequalities in the dispersions, was also 
applied. The scale values obtained from Case V and Case III are shown in 
Table 6. There we have also recorded the scale values obtained by two 
successive interval methods: the method of graded dichotomies (1) and the 
method of successive intervals described here. We have for each of the various 
methods a set of scale values obtained without any adjustments for inequalities 
in dispersions and also a set of values adjusted for inequalities in dispersions.t 


TABLE 7 


Intercorrelations of Scale Values Obtained by Various Methods of Scaling 

















Methods without Adjustments Methods with Adjustments 
for Inequalities in for Inequalities in 
Dispersions Dispersions 
Graded Paired Graded Paired 
Successive Dichoto- Compari- Successive Dichoto- Compari- 
Intervals mies sons Intervals mies sons 
(1) (2) (3) (4) (5) (6) 
(1) .996 .988 .999 .999 .982 
(2) .996 979 995 999 97 
(3) . 988 .979 . 986 . 984 991 
(4) .999 .995 . 986 .998 .981 
(5) 999 999 984 998 974 
(6) 982 971 991 981 974 





*Another test of interval consistency applied to data obtained by the method of 
successive intervals has been made with 17 stimuli and 10 categories of ratings. The 
average error was .021 for these data. 

tThese adjustments for inequalities in dispersions for the two successive interval 
methods were made by a procedure suggested by Attneave (1). The data obtained by 
successive interval scaling and by the method of paired comparisons indicated that the 
dispersions for the various stimuli were not equal. The paired-comparison estimates of the 
standard deviations, for example, showed considerable variation, ranging from a low of .52 
for stimulus 6 to a high of 1.32 for stimulus 10. Yet the average errors obtained from the 
internal consistency check applied to the adjusted and unadjusted values were much the 
same, being .027 for each of the two adjusted interval scales compared with .025 and .027 
for the unadjusted scales. This finding is supported by another study of 17 stimuli scaled 
into 10 categories. 





178 


PSYCHOMETRIKA 





The arbitrary origins in each instance have been taken as the mean of the set of 


10 scale values. 


in Figures 2-5. 


It can readily be determined that the scale values obtained by the various 
methods are linearly related to one another. Some of the plots are shown 


The product-moment correlations between the scale values obtained by 
the various methods are shown in Table 7. 





























FIG. 4 











FIG. 5 


2 a b 7) 
3 / 3 
Z s/f 2 1 
= . +s < ~ 
3 / = 
8 5 & A 
= “4 
« -+.0 LO a 1.0 1.0 
.- 4 . J 
a 6 a 6 
a 
. af A 
w a 
io ‘4 Ww 8 
2 o 
3 . 3 / 
af % 
$ Tr) @ 7 
5 ae 1Q 
y 
-LO -1.0 
UNADJUSTED SUCCESSIVE INTERVALS UNADJUSTED SUCCESSIVE INTERVALS 
FIG. 2 FIG. 3 
10 Lo 
ra ve 
o $ t 
3 ‘i MH ra 
2 Fl 3 
° . S 3 
= 3 S e 
S zy Fa $75 
ra) Y So 3 
ss _ 3 
a -10 8 ae 1.0 So +40 ike) 
< 7 ra) ° 
a 4 6 
o « 
er i 
ra) 
rm] 2 8 
= oy wu 7/ 
4 743 3 Poa 3 
> io a 10 
. = 
5 
+40 A0 
UNADJUSTED SUCCESSIVE INTERVALS UNADJUSTED SUCCESSIVE INTERVALS 




















ALLEN L. EDWARDS AND L. L. THURSTONE 179 





99.5 
99F 


98F 


95 


60fF- 


40} 





VANILLA ICE CREAM 


of ; “F 


ae 








1 1 1 1 1 


res eee 6 4 7 . 8 
SUCCESSIVE INTERVALS 





05 —+ 


FIGURE 6 


Cumulative Distributions of Judgments for Jello and Vanilla Ice Cream Plotted on Normal 
Probability Paper. Abscissa Represents the Psychological Continuum 


In obtaining the scale values by the method of successive intervals 
described here, we assumed that the discriminal processes for each of the 
stimuli were normally distributed about the scale values on the psychological 
continuum of Figure 1. If the observed distributions of judgments shown 
in Table 1 are plotted on normal probability paper with the abscissa marked 
off in terms of the intervals on the psychological continuum, the trend should 
be linear if the distributions actually are normal. These plots were made for 
each of the 10 stimuli. The results indicate that the assumption is reasonable, 





180 PSYCHOMETRIKA 


the fit being particularly good within the .05-.95 limits. The plots for 
Stimulus 1 (Vanilla Ice Cream) and Stimulus 8 (Jello) are shown in Figure 6. 
They are representative of the plots obtained for the other stimuli in the 
series. 


REFERENCES 
1. Attneave, F. A method of graded dichotomies for the scaling of judgments. Psychol. 
Rev., 1949, 56, 334-340. 
Guilford, J. P. Psychometric methods. New York: McGraw-Hill, 1936, p. 231. 
Guilford, J. P. The computation of psychological values from judgments in absolute 
categories. J. erp. Psychol., 1938, 22, 32-42. 
4. Hevner, K. An empirical study of three psychophysical methods. J. gen. Psychol., 
1930, 4, 191-212. 
5. Saffir, M. A. A comparative study of scales constructed by three psychophysical 
methods. Psychometrika, 1937, 2, 179-198. 
6. Thurstone, L. L. Psychophysical analysis. Amer. J. Psychol., 1927, 38, 368-389. 
7. Thurstone, L. L. A law of comparative judgment. Psychol. Rev., 1927, 34, 273-286. 
8. Thurstone, L. L. Unpublished study of food preferences. 


to 


~ 


Manuscript received 4/16/51 


Revised manuscript received 6/27/51 














PSYCHOMETRIKA—VOL. 17, No. 2 
JUNE, 1952 


THE RELATION OF THE RELIABILITY OF MULTIPLE-CHOICE 
TESTS TO THE DISTRIBUTION OF ITEM DIFFICULTIES 


Freperic M. Lorp 
EDUCATIONAL TESTING SERVICE 


* 


Under certain assumptions an expression, in terms of item difficulties 
and intercorrelations, is derived for the curvilinear correlation of test score 
on the ‘‘ability underlying the test,”’ this ability being defined as the common 
factor of the item tetrachoric intercorrelations corrected for guessing. It is 
shown that this curvilinear correlation is equal to the square root of the test 
reliability. Numerical values for these curvilinear correlations are presented 
for a number of hypothetical tests, defined in terms of their item parameters. 
These numerical results indicate that the reliability and the curvilinear corre- 
lation will be maximized by (1) minimizing the variability of item difficulty 
and (2) making the level of item difficulty somewhat easier than the halfway 
point between a chance percentage of correct answers and 100 per cent correct 
answers. 


Brogden (1), working from certain assumptions, computed by formula 
the “validity” and reliability of a number of selected hypothetical free- 
response tests. The numerical results on “validity” lead to the following 
tentative conclusions, among others: (1) If all items are of equal difficulty, 
“validity” will be a maximum when that difficulty (the proportion of correct 
answers to the item) is .50. (2) Increasing the range of item difficulties de- 
creases the test ‘validity’ as long as the tetrachoric item intercorrelations 
(r{;) are not unusually high (7{; < .40, say) and the number of items (n) 
not unusually large (n < 150, say). Brogden’s numerical results on test 
reliability confirmed Gulliksen’s conclusions (4) that. the reliability of a test 
increases (a) as the average item intercorrelation increases, (b) as the dis- 
persion of the item difficulties decreases, and (c) as the mean item difficulty 
approaches 50 per cent correct. 

Brogden’s work covers only the case where the test items cannot be 
answered correctly by guessing. Our purpose here is to carry through a 
similar, but not wholly parallel, investigation for tests composed of multiple- 
choice items. 

In dealing with the free-response case, Brogden limits consideration to 
tests composed of items whose tetrachoric intercorrelations have only one 
common factor (c). This common factor is taken to be the criterion, so that 
the correlation of test score (number of items answered correctly) with this 
criterion is called the test ‘‘validity.”” [Equivalent assumptions have been 
derived by Tucker (9) and used in a study of the relation of r{; to test 


181 








182 PSYCHOMETRIKA 


“validity.”” Wherry and Gaylord (10) present a valuable discussion relevant 
to an evaluation of these assumptions.] Now the use of the tetrachoric 
correlation implies that there may be assumed to be a normal bivariate 
distribution underlying the fourfold table from which the correlation is 
calculated. This is a reasonable assumption in the case of free-response 
items. Let us now suppose these same free-response items to be supplied 
with several possible answers, so that they are converted to multiple-choice 
items. A portion of the examinees who previously answered certain of the 
items incorrectly will now obtain some correct answers by guessing. This 
means that some frequencies, in effect, are moved from the lower portions 
of the original normal bivariate distribution to the upper portions. A cursory 
examination shows that the resulting distribution can no longer be con- 
sidered as normal bivariate.* Carroll (2, p. 17) describes an appropriate 
adjustment to be made before calculating the tetrachoric correlations in this 
situation. If we were nevertheless to compute tetrachoric correlations from 
the unadjusted multiple-choice item-response data, the common factor of 
these tetrachorics would no longer be the same variable as before—in fact, 
there would now in general be many common factors instead of only one, 
as may be verified by applying Carroll’s adjustment to appropriate hypo- 
thetical data. Since our criterion variable should logically remain the same 
whether we use free-response or multiple-choice items, it is obvious that 
tetrachoric correlations should not be calculated from the unadjusted item- 
response data for the present theoretical purposes. 

As already mentioned, Brogden’s “validity” coefficient is the product- 
moment correlation between test score and criterion (c). However, as Brogden 
himself points out, there necessarily exists a curvilinear relation between the 
test score and the criterion. It will perhaps be worth while to digress slightly 
at this point to show that this curvilinearity is not due to a flaw or peculiarity 
in our criterion, but rather is an inherently necessary attribute of any adequate 
criterion scale. 

Suppose that it were possible to set up some sort of criterion scale such 
that the regression of test score on criterion would be linear. If this could 
be done, there would be two points, cy) and ¢, , on the criterion scale such that 
the expected test score of examinees at these two criterion score levels would 
be 0 and n, respectively. Ordinarily we will not be willing, however, to 
assume that the lower and upper bounds of the ability scale (if indeed this 
scale is to be considered as bounded at all) correspond to the lower and upper 
bounds of the score scale of any particular test. This being the case, it may 


*For example, suppose the two variables, having a normal bivariate distribution to 
be z and y. One characteristic of this distribution is that, if yo is any given value of y, the 
conditional probability that y will exceed yo can be made to approach zero as closely as we 
please by assigning x a sufficiently large negative value. Now, granted that every examinee 
responds to every item, the probability of obtaining a correct answer to a multiple-choice 
item never approaches zero, no matter how low the ability of the examinee may be, because 
of the occurrence of guessing. Consequently, the normal bivariate surface cannot provide 
an accurate model for the relation between two multiple-choice items when guessing occurs. 











FREDERIC M. LORD 183 


be supposed that at least potential examinees exist whose criterion scores fall 
below c) or above c,. The test scores obtained by such examinees cannot 
fall on the straight regression line that we originally assumed, however, since 
this regression line would predict negative scores for examinees below c, and 
higher-than-perfect scores for examinees above c¢,. It is seen that the as- 
sumption of a straight regression line leads to logical contradictions that 
render this assumption untenable. The nature of the regression of test score 
on c is investigated in detail elsewhere (6). 

Because of the essential curvilinearity of the relation between test score 
and criterion, we will here prefer to use a curvilinear correlation coefficient 
between test score and criterion, rather than the product-moment coefficient 
used by Brogden. It would be preferable for us to work with the curvilinear 
correlation of criterion on test score. However, this does not appear to be 
practicable, so we will use the curvilinear correlation of test score on criterion 
instead. 

In order to find a formula for this curvilinear correlation, we will wish 
first of all to investigate the correlation between criterion and true score. 
Brogden uses the term ‘true score’ as synonymous with what we have 
here called the criterion; we will use the term “true score” in its more usual 
sense—to refer to the average score on an infinite number of equivalent 
forms of a test. That the true score in this sense is not the same as the criterion 
score is evident from the fact that a group of examinees whose criterion 
scores are normally distributed will nevertheless have a U-shaped distribu- 
tion of true scores, for example, if the test items are of equal difficulty and 
are highly intercorrelated. 

Since the criterion, c, is the only common factor of the test items, it 
follows that the test items will be uncorrelated when c is fixed. Consequently, 
in a group of examinees all having the same value of c, no one examinee 
will consistently perform better than any other examinee if the number of 
items is sufficiently large. We thus see from the law of large numbers that 
all examinees having a given value of c will necessarily obtain the same 
true score. Excluding from consideration the case when all r/; = 1, we thus 
have the important conclusion that true score and criterion score have a 
perfect curvilinear correlation. In other words, the criterion scale is simply 
a (nonlinear) transformation of the true score scale. 

As a by-product of this result, we see that if the true scores on two 
different tests have a perfect curvilinear correlation with each other, then 
both tests must have the same criterion scale, as here defined. We may say 
that this criterion scale remains invariant for all tests that are measures of 
the same ability, no matter how the tests may differ in item difficulties or 
intercorrelations. This result provides a justification for the choice of our 
criterion scale as the most useful measure of the ability underlying the test 
score. 

Now it is well known that the product-moment correlation between test 








184 PSYCHOMETRIKA 


score (s) and true score (¢) is equal to the square root of the test reliability, 
r,, , Where r,, is the usual product-moment correlation between the scores 
on two equivalent tests. Using the usual formula for a correiation coefficient 
in terms of the standard error of estimate (o,.,), we may write 


(1) 





Furthermore, the regression of test score on true score is linear, so that 
o.., represents the average variance of the distribution of s for fixed t. Now 
the criterion scale is simply a transformation of the true score scale. Conse- 
quently, of, = o... , Where o%., is the average variance of the distribution of 
s for fixed c. The curvilinear correlation of test score on criterion (7,,) is 
therefore equal to the square root of the test reliability: 


We will next need to express 7,, as a function of the item difficulties 
and intercorrelations. We will work first of all with the case of free-response 
items. We will let x; (¢ = 1, 2, --- , n) represent the “item score,” so that 
x; = 1 when the item is answered correctly and xz; = 0 when it is answered 
incorrectly. It is assumed that the examinee answers every item. We thus 
see that s = )> 2, . 

Taking the test reliability to be the correlation between two equivalent 
forms of the same test, and using the usual formula for the correlation of 


sums, we have 
> 
(, (3) 


oO; 





where o; is the standard deviation of x; , o; is the standard deviation of 
x;(j = 1,2, +--+, n), and 7,; is the fourfold point correlation (product-moment 
correlation) between x; and 2; . 

There is a tetrachoric correlation, 7{; , corresponding to every fourfold 
point correlation, r;; , 7 # j (since both correlations are calculated from the 
same fourfold table). The formula for r;; may be written (11, p. 253) 

sg Meet es 


c= % ) (4) 
V PiGiP 4 


where p; is the item difficulty (proportion of correct answers), gq; = 1 — p;, 
and A;; is the proportion of examinees answering both item 7 and item 7 
correctly. Now r{; is also a function of these same values; or instead, if we 
wish, we may say that A;; is a function of r/; , so that A,;; = A(rt; , p; , p;), 
say. This last function is tabled by Pearson (7, Vol. II). With the help of 

















FREDERIC M. LORD 185 


these tables, any value of r/; may thus be converted into the corresponding 
value of 7;; . 
It is well known that 


Cg V Didi (5) 
Substituting (4) and (5) in (3), we obtain 


p> > (Ais — DD) 





Tis 2 (6) 
Os 
Similar procedures give us 
oo = Dot DD aor; 
a iAxj 
= Lipa + 2d Ai — pp). (7) 


For the case of free-response items we have thus expressed the test reliability 
as a function of the item difficulties and tetrachoric intercorrelations. 

Carroll (2, eq. 30), and also Plumlee (8), have provided a formula ex- 
pressing the reliability of the score (8) on a set of items affected by chance 
success as a function of the reliability of the score (s) on a set of comparable 
items unaffected by chance success. It is assumed that all examinees who 
do not know the answer to a multiple-choice item have 1 chance in k of 
answering it correctly, where k is some unknown constant to be determined 
from the data or from theoretical considerations. In our notation, their 
formula becomes 

(k — 1)r,.0% 

(k —1lhor +n-—M,’ (8) 





ha = 
where M, is the mean test score. It is well known that 
1 
M, =- >p.. (9) 
ns 
Substituting (6), (7), and (9) in (8) and rearranging terms, we find finally 
that 
ie a Ay — (Ze Di)” 
k—2 


‘ | (10) 
Dae ow = Bae + py te a 





ss 





If we let k —, equation (10) becomes the same as equation (6) and 
may be used for the case of free-response items. In order to use equations 
(6) and (10), some method of estimating the values of A;; , r/; , or 7;; must 
be decided upon. The method to be used here will in general give slightly 
different results than the Kuder-Richardson Case 2, formula 5, used by 








186 PSYCHOMETRIKA 


Brogden for the free-response case. The value of 7;; used in the Case 2 Kuder- 
Richardson formula is estimated by means of the assumption that 


i 
V rites 
where r;, is the product-moment correlation (point-biserial correlation) be- 
tween item 7 and test score, and r;; is the product-moment correlation (four- 
fold-point correlation) between two equivalent items. 

Now (ignoring the spuriousness introduced by the presence of item 7 in 
test s) the ratio on the left of this equation represents the product-moment 
correlation that would be found between an infinitely long test composed 
entirely of items equivalent to item 7 and an infinitely long test composed 
entirely of subtests equivalent to test s. Except in the case where all items 
in the actual test are themselves equivalent to each other with respect to 
both item difficulty and item intercorrelations, the frequency distributions 
of the proportion of correct answers will not in general be the same for the 
two infinitely long tests under discussion. Consequently the two infinitely 
long tests cannot in general have a perfect product-moment correlation. 

For this reason, we will prefer here to work with r{; (from which r;; 
may be derived if desired), determining the values of r{; , and thus the 
values of A;; needed for equation (10), by the requirement that the matrix 
of tetrachoric item intercorrelations shall be of rank one when the values of 
r;, are placed in the diagonal. It is seen that in the special case where all 
tetrachoric item intercorrelations, r/; , have the same value, all rj; will also 
be equal to this value. 

As already indicated, the curvilinear correlation of test score on criterion 
is the square root of the test reliability. For the case of multiple-choice items, 
this relation is written 





I, 


Nee = Vris- (11) 
We will use (10) and (11) to calculate values of 7;. for various hypothetical 
tests. 
The hypothetical tests chosen for study are characterized as follows: 
1. Number of items: n = 100. (The Spearman-Brown formula may be 
used on 73, to adjust for different values of n.) 
2. Probability of chance success on any item in the test: 1/k = 1/2, 
1/3, 1/4, 1/5, or 0. 
3. Intercorrelation of items corrected for guessing (2, p. 17): ri; = .10, 
.20, oi .30 (r/; is assumed to be the same for all items in any one test, 
including the special case when 7 = 7). 
4. Distribution of item difficulty indices (h;): (a) all items of equal 
difficulty, or (b) item difficulty indices approximately normally dis- 
tributed. 

















FREDERIC M. LORD 187 


5. Average item difficulty (M,): (vide infra). 
6. Variability of item difficulty (0,): (vide infra). 


Our main interest will center on a comparison among tests differing 
either in average difficulty or in variability of item difficulty, but not in 
both, while all other characteristics of the tests remain constant. Some com- 
parisons among tests differing from each other only in respect to k or in re- 
spect to r{; will incidentally be made possible. We may note that a test 
composed of items of equal difficulty may be considered as a limiting case 
of a test having a normal distribution of item difficulty indices when the 
variability of item difficulty vanishes. 

Item difficulty may be discussed in terms of 

1. p,; , the actual proportion of correct answers to item 7; 

2.7; = oe an estimate of the proportion of examinees who 

actually know the answer to the item; 

3. h; , the relative deviate of the standardized normal curve above which 

p; of the frequency lies. 


The item difficulties of the hypothetical tests are specified initially in terms 
of h; , and it is these indices that have approximately a normal distribution 
with specified mean and standard deviation. To facilitate interpretation, 
however, it will often be helpful to translate mean item difficulty from the 
scale of h,; into the scale of Dp; . 

Table 1 shows the frequency distribution of item difficulty indices for 


TABLE 1 


Frequency Distribution of Item Difficulty Indices (h;) for Three Hypothetical Tests, 
Showing the Mean (M,) and Standard Deviation (o,) of Each Distribution 




















h Frequency h Frequency h Frequency 
1.6 2 8 4 

1.2 6 6 ‘i 3 5 

8 13 4 12 2 11 

4 17 2 17 oa 21 
0 24 0 20 0 26 
— .4 a7 — .2 17 -.1 21 
— .8 13 — .4 12 — .2 11 
—1.2 6 — .6 re — 3 5 
—1.6 2 — 8 + 

100 100 100 

M, = 0 M, =0 M, = 0 


on = .70 o, = .39 o = .15 








188 PSYCHOMETRIKA 


three hypothetical tests, together with the mean and standard deviation of 
each distribution. With the exception of the hypothetical tests composed 
entirely of items of equal difficulty, every one of the frequency distributions 
for the hypothetical tests to be considered is identical with one of the three 
shown, except for a possible shift in origin which changes the mean but not 
the standard deviation of the distribution. Any of our hypothetical tests 
may thus be specified by its values of k, ri; , M, , and o, . The values of o, 
given were chosen after a survey of item-analysis data on a number of Educa- 
tional Testing Service tests, the results of which indicated that .70 was 
about as high a value of o, , and that .40 was about as low a value as would 
usually be found. The value of o, = .15 was chosen as representing about 
as low a value as one could expect to achieve if the attempt were made to 
build a test composed of items all of equal difficulty. 


TABLE 2 


Curvilinear Correlation of Test Score with Criterion (y,,) as a Function of Item Difficulty 
(p or h) for Equivalent* Free-Response Items Having a Specified 
Tetrachoric Intercorrelation (r’;;) 








Item Difficulty 





ri; = .10 ri; = .20 r;; = .30 
p h 
421 2 . 9330 . 9672 . 9796 
. 460 % | . 9336 .9675 .9798 
.500 0 . 9338 . 9676 .9799 
.540 ee . 9336 .9675 .9798 
.579 — 2 . 9330 . 9672 .9796 





*All items in any one test have equal difficulties and equal intercorrelations. 


For purposes of comparison, Table 2 gives the values of the curvilinear 
correlation of test score with criterion for various free-response tests, as 
calculated from equations (2) and (10) when k =o. All items in any one 
of these tests are “equivalent” to each other, i.e., they have equal difficulties 
and equal intercorrelations. The maximum value of 7,, is found at p = .50, 
as would be expected from Gulliksen’s and Brogden’s results. 

Table 3 gives comparable data for various multiple-choice tests, each 
test being composed of equivalent items. It is assumed here, for expository 
purposes only, that k is equal to the number of choices per item. For any 
given value of k and of r{; in Table 3, the various tabled tests differ from 
each other only in item difficulty, so that the effect of variation in this char- 
acteristic may be readily observed. 

It is commonly considered that the optimum item difficulty is halfway 
between the chance level (p = 1/k) and 1.00. In Table 3, however, it is 




















a. ————— 





FREDERIC M. LORD 


TABLE 3 


Curvilinear Correlation of Test Score with Criterion (y;.) as a Function of Uncorrected 
Item Difficulty (p), for Equivalent* k-Choice Items Having a Specified Corrected Tetra- 
choric Intercorrelation (r’;;) 











Item Difficulty 














rs; = .10 rs; = .20 “3; = .30 
p (h) 
.691 (+ .3) 7972 .8841 . 9204 
Test . 750 je; > 82757 .9039T .9347F 
Com- .790 (— .2) .8383 .9110 .9399 
posed .809 (— .3) 8414 .9131 .9416 
of 2- .828 (— .4) 8431 .9144 .9427 
Choice 846 (— .5) 8435 .9150 .9433 
Items .863 (— .6) 8426 .9148 9434 
.879 (— .7) 8405 .9139 .9430 
.588 (+ .3) .9227 
Test .667 ( 0 ) (Not .9341T (Not 
Com- .720 (— .2) .9378 
posed .745 (— .3) .9388 
of 3- mY i (— .4) com- . 9392 com- 
Choice .794 (— .5) .9390 
Items .817 (— .6) puted) .9384 puted) 
.839 (— .7) .9372 
.537 (+ .3) . 9367 
Test .625 (6)-) (Not .9449F (Not 
Com- .684 (— .2) .9473 
posed .7138 (— .3) com- .9478 com- 
of 4- . 742 (— .4) 9479 
Choice 769 (—..5) puted) .9475 puted) 
Items 794 (— .6) . 9467 
.818 (-— .7) . 9454 
.506 (+ .3) .8920 .9440 .9635 
Test . 600 c @ 3 .9034T .9504T .9679F 
Com- . 663 (— .2) . 9063 .9522 . 9692 
posed .694 (— .38) . 9066 9825 9695 
of 5- .724 (— .4) . 9061 . 9523 9695 
Choice . 153 (— .5) . 9049 .9518 . 9692 
Items .781 (— .6) .9029 . 9509 .9688 
.806 (— .7) . 9002 .9496 .9680 





*A]] items in any one test have equal difficulties and equal intercorrelations. 


+The values of n;, for tests having a difficulty level halfway between the chance level and 1.00 are marked 
with an asterisk to facilitate comparison with the correspondiing maximum values of 9;-, which are italicized. 








190 PSYCHOMETRIKA 


seen that the maximum test reliability and the maximum curvilinear correlation 
of test score on criterion, for a test composed of equivalent items, is obtained 
when the item difficulty is somewhat easier than halfway between the chance level 
and 1.00. Related, but not identical, conclusions have been reached by 
Lord (6, p. 35) and by Cronbach and Warrington (3). The conclusion is 
seen to be plausible in view of the fact that multiple-choice items become 
more and more unreliable as the items become more difficult and the amount 
of guessing increases. 


TABLE 4 


Curvilinear Correlation of Test Score with Criterion (7;.) as a Function of the Standard 

Deviation of the Item Difficulty Indices (¢,), for Certain Tests Composed of 2-Choice 

Items Having a Specified Corrected Tetrachoric Intercorrelation (r’;;) and Average Item 
Difficulty Coefficient, M7 (p) 




















Tetrachcric Item Variability of 
Intercorrelation Item Difficulty 
(7:3) (on) M(p) = .846* M(p) = .750 
.10 0 844 .827 
15 .841 825 
389 829 .812 
a .799 .778 
M(p) = .846* M(p) = .750 
.20 0 .915 . 904 
15 .914 . 902 
.39 . 906 .893 
.70 885 .871 
M(p) = .863* M(p) = .750 
.30 0 .948 .935 
15 .942 .934 
.39 . 937 .927 
.70 .922 .910 





*These are the values of M(f) that yield (approximately) the maximum test validity wheng );= 0. 


Table 4 gives values of the curvilinear correlation of test score with 
criterion for various tests composed of 2-choice items. The average item 
difficulty of a given test is, for convenience, represented by M(p), a quantity 
computed from the mean value of (/,)h by the same formula used to com- 
pute p from h. Two values of M(p) are treated in the table: (1) the value 
found in Table 3 to yield the maximum value of y;, , and (2) the value (.75) 
halfway between the chance level (.50) and 1.00. For any given value of 
M(p) and of r{; , the various tabled tests differ from each other only in the 
standard deviation of their item difficulty indices, so that study of the effect 














FREDERIC M. LORD 191 


on 7;. of variation in this standard deviation is facilitated. Table 5 presents 
similar data for 5-choice tests. For purposes of comparison, Table 6 presents 
similar data for the free-response case. 


TABLE 5 


Curvilinear Correlation of Test Score with Criterion (n;,) as a Function of the Standard 

Deviation of the Item Difficulty Indices (o,), for Certain Tests Composed of 5-Choice 

Items Having a Specified Corrected Tetrachoric Intercorrelation (r’;;) and Average Item 
Difficulty Coefficient, M(p) 

















Tetrachoric Item Variability of 
Intercorrelation Item Difficulty 
(r's3) (on) M(p) = .694* M(p) = .600 
10 0 .907 .903 
15 .906 .902 
.39 .899 .896 
.70 . 884 .879 
.20 0 .9525 . 9504 
15 .9518 .9498 
39 .948 .946 
AYA .939 .936 
.30 0 .9695 .9679 
15 .9690 .9675 
.39 . 966 . 965 
.70 .960 .958 





*This is the value of M(p) that yields (approximately) the maximum test validity when a, = 0. 


The tables show, for the tests studied, that reliability and curvilinear corre- 
lation of test score on criterion decrease as the variability of item difficulty in- 
creases. This result for multiple-choice items corresponds to the previously 
cited conclusions with respect to reliability reached by Gulliksen and by 
Brogden for the case of free-response items. It should be noted that we have 
presented no data for the case when the r{; are unusually high; Brogden’s 
results with respect to reliability, however, indicate that this conclusion is 
valid even for r{; = .80. It is nevertheless still possible, as suggested by 
Brogden’s study of the product-moment correlation between test score and 
criterion, that the foregoing conclusion will not hold for high values of r{; 
when the curvilinear correlation of criterion on test score is investigated in 
lieu of 7,. . 

Discussion 

We have considered only tests composed of items whose intercorrela- 

tions have only one common factor. This common factor is taken to be 








192 PSYCHOMETRIKA 


TABLE 6 
Curvilinear Correlation of Test Score with Criterion (y,,) as a Function of the Standard 
Deviation of the Item Difficulty Indices (o;), for Certain Tests Composed of Free-Response 
Items Having a Specified Tetrachoric Intercorrelation (r’;;) and an Average Item Difficulty 

















of M, = .50 
Tetrachoric Item Variability of 
Intercorrelation (r/;) Item Difficulty (cn) M, = .50 

.10 0 .9338 
15 . 9333 
39 .930 
.70 .923 

.20 0 . 9676 
15 .9674 
.39 .966 
.70 .962 

.30 0 .9798 
15 .9796 
89 .9785 
.70 .976 





the “underlying ability” or “criterion,” with which the test should be as 
highly correlated as possible. We have shown, under certain assumptions, 
that the curvilinear correlation of test score on the “underlying ability” or 
“criterion” is equal to the square root of the test reliability. Finally, we 
have made a limited investigation, primarily for multiple-choice tests, of the 
relation of this curvilinear correlation coefficient (and thus of the test re- 
liability) to (1) item difficulty, and (2) variability of item difficulty, for the 
case where the item difficulty indices are approximately normally distributed. 
Our numerical results indicate for the tests investigated that the reliability 
and the curvilinear correlation of the test score on the criterion will be 
maximized by (1) minimizing the variability of item difficulty, and (2) 
making the level of item difficulty somewhat easier than the halfway point 
between a chance level and 1.00. 

The differences between the various numerical values of 7;, given in 
the foregoing tables are in many cases quite small. Even small numerical 
differences, however, may have considerable practical importance. For ex- 
ample, an increase of 7;,. from .954 to .959 represents an increase in reliability 
from .91 to .92. From the Spearman-Brown formula it is found that the 
number of items (and the total testing time) would have to be increased by 
14 per cent if this increase in reliability were to be achieved by lengthening 
the test. On the other hand, if a reliability of .91 is adequate, an improved 























FREDERIC M. LORD 193 


test having a reliability of .92 could be shortened and the testing time de- 
creased 12 per cent without lowering the reliability below .91. 

Accurate item-difficulty values yielding the maximum reliability and 
curvilinear correlation for a test composed of m-choice items have not been 
emphasized because the probability of chance success on an m-choice item 
cannot in practice be taken as exactly 1/m. Furthermore, many examinees 
omit many of the items when they do not know the answers, with the result 
that guessing on difficult items is decreased and the reliability of difficult 
items is increased over what would otherwise be the case. Our results never- 
theless suggest that maximum test reliability and curvilinear correlation with 
the criterion will be obtained with items somewhat easier than the halfway 
level previously mentioned. 

The use of easier items will, of course, reduce the discriminating power 
of the test for the most competent examinees. Cronbach and Warrington’s 
work (3) indicates that this effect is not too serious unless the item inter- 
correlations are extraordinarily high. 

These considerations remind us that it is not necessarily always de- 
sirable to maximize the over-all test reliability or correlation with a criterion. 
In the case of a test used with a cutting score, for example, quite different 
measures of the test’s effective “reliability” and ‘‘validity” are required. In 
such a case, the discriminating power of the test will be maximized by mini- 
mizing the variability of the item difficulties, provided the proper average 
difficulty level is chosen (3, 6). 


REFERENCES 


1. Brogden, H. E. Variation in test validity with variation in the distribution of item 
difficulties, number of items, and degree of their intercorrelation. Psychometrika, 
1946, 11, 197-214. 

Carroll, J. B. The effect of difficulty and chance success on correlations between items 

or between tests. Psychometrika, 1945, 10, 1-20. 

3. Cronbach, L. J., and Warrington, W. G. Design study for sonar pitch memory test. 
Bureau of Research and Service, College of Education, Univ. of Illinois, Urbana, IIl1., 
1951. See also Efficiency of multiple-choice tests as a function of spread of item diffi- 
culties, Psychometrika, 1952, 17, 127-147. 

4. Gulliksen, H. The relation of item difficulty and inter-item correlation to test variance 
and reliability. Psychometrika, 1945, 10, 79-91. 

5. Kuder, G. F., and Richardson, M. W. The theory of the estimation of test reliability. 
Psychometrika, 1937, 2, 151-160. 

6. Lord, F. M. A theory of test scores. Psychometric Monograph No. 7, 1952. 

7. Pearson, K. Tables for statisticians and biometricians. London: Cambridge Univ. 
Press, 1924. 

8. Plumlee, L. B. The effect of difficulty and chance success on item-test correlations 
and test reliability. Psychometrika, 1952, 17, 69-86. 

9. Tucker, L. R. Maximum validity of a test with equivalent items. Psychometrika, 
1946, 11, 1-18. 


to 





194 


10. 


i. 





PSYCHOMETRIKA 


Wherry, R. J., and Gaylord, R. H. Factor pattern of test items and tests as a function 
of the correlation coefficient: content, difficulty, and constant error factors. Psycho- 
metrika, 1944, 9, 237-244. 

Yule, G. U., and Kendall, M.G. An introduction to the theory of statistics. London: 
Charles Griffin and Company, 1940. 


Manuscript received 8/6/51 


Revised manuscript received 12/10/51 

















PSYCHOMETRIKA—VOL. 17, No. 2 
JUNE, 1952 


ON THE DETERMINATION OF REDUNDANCIES IN 
SOCIOMETRIC CHAINS* 


Tan C. Ross AND FRANK HARARY 
UNIVERSITY OF MICHIGAN 


The use of a matrix to represent a relationship between the members of a 
group is well known in sociometry. If this matrix is raised to a certain power, 
the elements appearing give the total number of connecting paths between 
each pair of members. In general, some of these paths will be redundant. 
Methods of finding the number of such redundant paths have been developed 
for three- and four-step chains by Luce and Perry (8) and Katz (2), respec- 
tively. We have derived formulas for the number of redundant paths of five 
and six steps; and in addition, an algorithm for determining the number of 
redundant paths of any given length. 


1. Introduction 


The problem of redundant paths in communication matrices has been 
open for some time. Solutions for the third and fourth powers of the matrix 
have been published by Luce and Perry (3) and Katz (2). 

A method is presented by which the number of redundant paths in 
any power of the matrix may be calculated. By this method, the matrix 
of redundant paths is expressed in terms of the given matrix and some 
matrix operations. In addition to applying this procedure to three- and four- 
step redundant paths, we have derived formulas for the cases of five- and 
six-step redundant paths. The method is an elementary application of a well- 
known identity involving combinations to a partition function which is 
appropriate for this problem. Numerical examples are given to illustrate 
the use of each formula. 


2. Redundant Paths and Partitions 


2.1. geass Paths. In a group of n people (denoted by the letters 
i, j, k, +++) a one-step path exists between 2 and 7 if 2 communicates with 7. 
A path me z to j is a sequence of steps beginning with 7 and ending with J, 
a ~ j. For example, 7kj is a two-step path from 7 to J; 2klkj is a four-step 
path from 7 to > 5 etc. A redundant path is a path in which at least one letter 
Let the n X n matrix M semen the communication siahibal n of the waite 
ular group under consideration, in which the 7,7 entry is 1 if 7 communicates 


*The research leading to this paper was supported by a grant from the Rockefeller 
Foundation. 


195 











196 PSYCHOMETRIKA 


with j and 0 otherwise. It is assumed that no individual communicates 
with himself and therefore the main diagonal of the matrix consists only of 
zeros. As pointed out by Festinger (1), the sth power of the matrix M has 
as its 7, 7 entry the total number of s-step paths (both redundant and non- 
redundant) from 7 to /. 

Let R, denote the matrix whose 7,7 entry is the number of redundant 
s-step paths from 7 to j. Obviously, the 7,7 entry of R, for i ¥ 7 is a non- 
negative integer which is not greater than the corresponding entry of M*. 
It is also obvious that R, = M’* except possibly on the main diagonal for 
all s > n since any path involving more than n — 1 steps and drawn from 
n letters, must contain at least one letter more than once. 


2.2. Partitions. A particular kind of partition is useful in the consideration 
of redundant paths; namely, the number of ways in which the positive 
integer s can be written as the sum of three integers, such that the first and 
third are non-negative but not both zero, the second is greater than one, 
and the order of the summands is significant. Let s = s,; + s. + 83; be such 
a partition of s. A redundant path from 7 to 7 is said to satisfy this partition 
if the number of steps from the initial 7 to any of the places in which a re- 
peated letter occurs other than its last appearance, is s, ; the number of 
steps from this appearance of the repeated letter to a later appearance is 
8. ; and the number of steps from this second appearance to the terminal j 
is s; . Thus all the partitions of 3 are: 


3=0+2+1, 
83=1+2+0. 


For example, the redundant path 7zk7j satisfies the partition 0 + 2 + 1 in 
that the number of steps from the first letter to the repeated letter is 0, the 
number of steps from the repeated letter to its second appearance is 2, and 
the number of remaining steps to the last letter is 1. 

Some redundant paths satisfy more than one partition. Thus 7j7j satisfies 
0 + 2+ 1 when 7 is considered as the repeated letter and 1 + 2 + 0 when 
j is so considered. Also two partitions of s may be inconsistent in the sense 
that there is no path satisfying both of them. 

We conclude this section with two obvious remarks regarding these 
partitions which will be used below. 


Remark 1: The maximum number of partitions which are satisfied by a 


redundant s-step path is* 
({s+ ] ) 
2C (| 5 | 2): 


*The notation [z] used in the following formula is defined as usual by: [z] = the 
largest integer which is not greater than xz. Also C(n,r) denotes the number of combinations 
of n objects taken r at a time. 














IAN C. ROSS AND FRANK HARARY 197 


Proof: If sis odd, the redundant path with the greatest number of partitions 
is 7j7j7 --- 77. The number of partitions in which 7 is considered as the repeated 


letter is ot tt 2) since there are s + 1 letters (including multiplicity) 


half of which are 7 and these are combined two at a time. Obviously the 
corresponding number of partitions considering 7 as repeated is the same. 


Hence the maximum number is 2C (eae 2) when s is odd. 


If s is even, each redundant path with the greatest number of partitions 
is of the form 7j7j7 --- k ---+ 7j with exactly one letter k. In such a path, s/2 
of the letters are 7 and s/2 are 7. Hence the maximum number of partitions 
is as above 2C(s/2, 2). But when s is even, ; = + , 


Therefore regardless of the parity of s, the maximum number of partitions 
is 2c(| +] 2). 


Remark 2: The number of partitions of s is C(s,2) — 1. 


Proof: Clearly the number of partitions of s as defined above is the number 
of partitions of s or less into two summands such that the first is non-negative 
and the second is greater than one but less than s, in which the order of 
summands is significant. For 0 + s + 0 is not a partition of s according to 
our definition. 

When the second summand is s — 1 there are two possible first sum- 
mands, namely 0 and 1. As the second summand decreases by one, the 
number of possible first summands increases by one, etc. Therefore the 
number of partitions of s is 2 + --- + (s — 1) which is well known to equal 
C(s,2) — 1. 


3. Procedure for Counting Each Redundant Path Exactly Once 


We wish to find for each 7,7 and each s the number of redundant paths 
from 7 to j. The number of redundant s-step paths from 7 to 7 satisfying 
each partition of s may be readily expressed in terms of the given matrix. 
However, the sum of these partitions may be more than the total number 
of redundant paths since a path may satisfy more than one partition. This 
difficulty can be handled by the use of the following identity involving 
combinations: 


1 = C(r,1) — C(r,2) + C03) — C04) + + + (-1C,r) 
- > (—1)'"'C@,), 


whose validity for any positive integer r is implied at once by the binomial 
theorem and the fact that (1 — 1)’ = 0. 





198 PSYCHOMETRIKA 


For if the total number of redundant,paths from 7 to 7 is W,; , the 
value of W;; can be determined as follows. 
Let (i) fi, fe, +++, f be the set of all partitions of s (¢ = C(s,2) — 1 
by Remark 2), 
(ii) a, be the number of paths satisfying f, , 


Gi) A, = D7 4., 
z=1 
(iv) a,,, be the number of paths satisfying both f, and f,,2 ¥ y, 


(v) A, = d Az,y 5 


z,y= 
z<y 


and in general 


(vi) @,,,2,,---,2, be the number of paths simultaneously satisfying 
all the partitions f,,, fz,, °**, fz, , and 


(vii) A, =— a Qz,,23,°°*,2¢ 
Bi<Za<ee*<zye 
s+ 1 


5) | 2) (by Remark 1). 


for each ¢ from 1 tou = 2c(| 


Then it follows from the combinatorial identity above that 
Wi; a A, tind A, + A; cea es -+- (—1)"**A, ° (1) 


Let R‘” be the matrix whose 7,7 entry is A, . Then (1) can be expressed 
as a matrix equation which yields R, , the matrix of redundant s-step paths 
whose 7,j entry is W;; , namely: 


R, = RE — RO +R — --» + (-1) "RY? 


= 2 (-1)'R.”. 





(2) 


4. Redundant Paths of Three and Four Steps 


4.1. Notation. Throughout this subsection, let A,B be n X n matrices whose 
i,j entries are a;; , b;;, respectively. 

This is also expressible by A = || a; ||, B = |] ;; ||. 

The usual definitions of matrix addition, A + B; matrix subtraction, 
A — B; multiplication of matrix A by scalar c, cA; ordinary multiplication 
of matrices, A-B; and the transpose of matrix A, A’, are assumed and are 
given in detail in Weiss (4). Two other operations will also be required: 


(1) Elementwise matrix multiplication A X B:* 
A xX B = \|a;;b;;||. 


*Elementwise matrix multiplication is due to Hadamard, as mentioned in Paul R. 
Halmos, Finite Dimensional Vector Spaces, Princeton University Press, 1942. 














IAN C. ROSS AND FRANK HARARY 199 


(2) The diagonal operator d(A): 
d(A) is the matrix whose principal diagonal is that of A and whose 


remaining entries are zeros. 
Referring specifically now to the given communication matrix M, we 
introduce the notation S = M X M’ = the matrix of mutual communica- 


tions in the group. 
Using the notation just introduced, we shall next express R, in terms 
of the given matrix M for s = 3 and s = 4. 


4.2. Derivation of R;. By equation (2), R; = R3” — R;”. 

The admissible partitions of 3 are f,: 1+ 2+ 0, and f,:0+2+1. 
The matrices of all 3-step paths satisfying f, and f. are M-d(M’) and 
d(M’)-M, respectively. Therefore R;” = M-d(M’) + d(M’)-M. 

The matrix of all 3-step paths satisfying both f, and f. is M xX M’. 
Therefore RS” = M X M’ = S. Hence 


R, = [M - d(M?) + d(M’) - M] - S. (3) 


This is the same formula as obtained by Luce and Perry (3), but the 
notation is different. 


4.3. Derivation of R, . By equation (2) again, R, = Ry” — Ry”. The parti- 
tions of 4 are: 

f, : 130 (here 130 is an abbreviation of the partition 4 = 1+ 3 + 0), 

fo: 081, f, : 220, f, : 022, f, : 121. 

The following table lists these five partitions, with the most general 
path from 7 to 7 satisfying each and the matrix whose 7,7 entry is the number 
of such paths. 








TABLE 1 
Partition hi fe fs - fs fs 
General Path ijkl iklij iki ikilj iklj 
Matrix M-d(M*) d(M*)-M M?-d(M?) d(M?)-M? M-d(M?)-M 





We show how the matrix associated with f, is found. This matrix will 
have as its 7,7 entry the product m;;m;,m,..m,; where M = || m,; || and 
each repeated index other than 7 and 7 is understood to be summed from 
1 to n. But the matrix whose entries are m;,m,,m,; is d(M/*). Therefore the 
desired matrix is M-d(M*). The other matrices are derived similarly. Thus 


© = [M - d(M’) + d(M’) - M] + [M? - d(M?) + d(M’) - M’] 
+ M - d(M’)- M. 








200 PSYCHOMETRIKA 


The consideration of the notion of inconsistent paths mentioned in 
Section 2.2 is now relevant. As previously stated, two partitions are called 
consistent if the existence of a path satisfying both is possible, and incon- 
sistent otherwise. To illustrate consistency of two of the partitions of 4, 
consider f, and f, . The most general paths satisfying f, and f. are iklj 
and zklij, respectively. Hence the most general path satisfying both f,; and 
f2 is jkij and such paths may exist. 

To illustrate inconsistency consider f,; and f,; . The most general path 
satisfying f, is 7kjlj. Hence the most general path satisfying both f, and fs 
is 77jlj7 and no such path can exist since d(M) = 0. 

Another way in which two partitions may be inconsistent is illustrated 
by f; and f, . A path satisfying both of these partitions would have to ex- 
hibit both 7 and 7 in exactly the same position in the path. 

To find R{” we construct a consistency table for these five partitions. 
In this table, 1 denotes consistency; 0 inconsistency. 

Proceeding as in these illustrations, the consistency table is: 


_|fo Sa fa fs 
fij1 0 1 0 
fe |1 0 O 
fs jo 1 
fa | 1 


The following table lists those pairs of partitions which are consistent, 
with the most general path from 7 to 7 satisfying each pair and the matrix 
whose 7,7 entry is the number of such paths. 


TABLE 2 








Partitions fife Sifs Sots Safs Sats 
General Path = ijkij ajikj tkjij tkjkj tkikj 
Matrix M X M*” SX Mw SX M M:S S:M 





The matrix associated with the pair f,,f. has as its 7,j entry the product 
m;;m;,.m,,m,;; . But since each m;; is 0 or 1, m;;m;; = m,;; . We thus seek 
the matrix whose i,j entry is m;;m,;,m, . But m;,m,; is the i,j entry of M”’ 
(the transpose of /*). Therefore the desired matrix is M xX M”. 

The matrix associated with the pair f,; , f; has as its 7,7 entry the product 
MixM,;M,.M,; Which equals m;,m,;m,; . But m;,m,; is the 7,k entry of S, 
m,; is the k,j entry of M and summation is understood with respect to k. 
Therefore the desired matrix is S-M. 

All other derivations of matrices associated with consistent sets of parti- 
tions are similar and will be omitted. 








IAN C. ROSS AND FRANK HARARY 201 


Thus 
RP? =MXM*+S8SxXM+SXM+4+([M-S+S8-M]. 
But R, = R{” — R{. Hence 
R, = [M - d(M*) + d(M*) - M] + [M? - d(M’) + d(M’) - M’] (4) 
+M-d(M’)-M—M XM’ —- 28x M’- [M-S8S+S8S.- M]}. 


The computation of these matrix formulas is considerably simplified by 
utilizing the following notion of duality: If a + b + c is a partition of s, 
its dual is the partition c + b + a. Thus in the partitions of 4, f; and f. 
are duals, f; and f, are duals, and f; is self-dual. Since the dual of a partition 
is obtained by writing it in reverse order, there is a ‘“‘reverse order” pro- 
cedure for determining the matrix associated with the dual of a partition 
from the matrix associated with the partition itself, namely reverse the order 
of ordinary matrix multiplication (-) wherever it occurs and leave all other 
operations unchanged. The reason for this is that the general path corre- 
sponding to the dual of a partition is obtained from the original path by 
interchanging the letters 7 and 7 and then reading the original path back- 
wards. Thus, in the formula for R;”, since the partitions f; and f, are dual, 
and f; is self-dual, the consistent pairs f3,f; and f,,f; are dual. The duality 
of the general paths and matrix formulas may be seen by reference to these 
partition pairs in Table 2. 

In formulas (3) and (4), those terms which are duals of each other are 
enclosed in brackets. 


5. Redundant Paths of Five and Six Steps 
5.1. Derivation of R; . 


The partitions of 5 are 


fi : 140 fs: O41 

1:77 f,: @2 

{,: a fe: 0B 

fr 1122 fs 1 221 
fo: 181 


As in the preceding section, consistency tables are constructed for these 
nine partitions taken 2, 3, 4, 5, and 6 at a time. (By Remark 1 of Section 2, 
6 is the greatest possible number of simultaneously consistent partitions of 
5.) Also as above, the matrices corresponding to the appropriate general 
paths are found and substituted into the formula (the special case of equation 
(2) for s = 5), 





202 PSYCHOMETRIKA 


Rs = RSP — RP + RY — RY + RY — RS, the terms of which are: 
o = [M - d(M*) + d(M‘) - M] + [M2 - d(M°) + d(M) - M?| 
4 [M? - d(M?) + dM?) - M*] + [M- dQ?) - WP 
4+ M?-d(M)-M)4+M-d(M)-M, 
R® = 3[M - d(M’)? + d(M?)? - M) + [M- dM - dQ) - ) 
+ d(M - d(M*) - M)- M) + 2(M - (8 x M’) + (8 X M’) - M] 
4+ (M+ (MX M”’) + (MX M”)-M)+([(M’-S4+8- MM 
+ M X M* + 2M xX M’ X M*’ + 2M’*xS 
+ M’ x Mx M?4+d(M’)-M-d(M’)+M-S-M, 
R® = 6[d(M?) - S + S - a(M?)] + 3[M - d(M?) + dM’) - M] 
4 (M - d(M?)?? + d(M?? - M) + 0? x M’+3M x 8’, 
RS = 98 + 28 - d(M*) + d(M) - S] + [M - d(M’) + a(t) - M), 
R;” = 6S, and 
R;? = S. 





Combining these terms, we get 
R; = [M - d(M*) + d(M’) - M] + [M? - d(M*) + d(M*) - M’] 

+ [M* - d(M’) + d(M’) - M*] + [M - d(M’) - 

+ M? - d(M?) - M) + 2[M - d(M?) + d(M’) - M] 

+ 4[d(M*) - S+ 8 -d(M*)]}+ M - d(M*)- M+ M* x M’ 

+ 3M x S’ —[M-d(M - d(M’) - M) + d(M - d(M’) - M) - M] 
Q[M - (SX M*) +(S X M’?)- M] —[S-M?+M?’- S] 
2(M - d(M’)? + d(M’)?-M]— [M-(M X M*’)+ (MX M’)- M] 
— M X M* — 2S < M® — d(M’) - M - d(M’) — M’ X M’ xX Mw’ 
—M-S-M-—2M X MX M’ - 4S. (5) 


In this equation, as in the preceding equations (3) and (4), those terms 
which are duals of each other are enclosed in brackets. 
5.2. Derivation of Re . 


By a similar procedure the formula for the number of redundant six- 
step paths is found. The partitions of 6 are: 

















IAN C. ROSS AND FRANK HARARY 203 


fi : 150 f.: 051 
fs : 240 fs : 042 
fs : 330 fe : 033 
fz : 420 fs : 024 
fo: 123 fio: 321 
fat TM fa: 1 
jai Mi jJu:2 


Since the greatest possible number of partitions that are simultaneously 
consistent is 6, the formula for R, is found from 


R, = Ry’ — Rs’ +R — Rs +R,” — Ro”. 
Because of the length of this formula, it is presented in combined form only. 
Re, = [M - d(M’) + d(M*) - M] + [M?’ - d(M‘) + d(M") - M?] 
+ [M® - d(M*) + d(M*) - M*] + [M* - d(M’) + d(M’) - M*] 
+ [M - d(M’) - M* + M*- d(M’) - M) + [M - dM) - 
+ M’- d(M*)-M])+M.- d(M*)- M+ M? - d(M’) - MW 
+ 2[M? - d(M*) + d(M’) - M?] + 4[M - S - d(M°) + d(M’)- S- M] 
+ 4[M -d(M’)-S+S8S-d(M’)-M]+[M-dM-S-M) 
+d(M-S-+-M)-M)+2M-d(M’)-M+4[M - d(S - M’) 
+d(M*-S)-M)+4[M-dM’-S)+d(S- MW’) - M] 
+ [M+ (M’ x M*) + (M’ x M’) - M) + 3[M - (M x 8’) 
+ (M xX S*)- M)+3M x [S- (M’ X M*) + (M’ x M’) - S] 
+ 38M x [S- (MX M*) + (M x M’) - S] 
+ M xX (M’ - d(M’) - M’) + 8M’ x S’ + M’ x M”’ 
+ 2M’ x [M - (M X M’) + (M X M’) - M) + [S-M - d(M’) 
+ d(M*) - M - 8] + 8[(S x M’) - d(M’) + d(M’) - (S X M’)] 
+ 4[(M x M*) - d(M’) + d(M’) - (MX M*’)] + 4[S - d(M*) 
+ d(M*) - 8S] + 2S x (M - d(M’) - M) — [M*-S+ S- M*] 
—(M’-S-M+M.-S- M’] — [M’.- (M x M’) 
+ (M X M”) - M’] — 2[M’ - (S X M’) + (S X M’) - M’] 


217% & @*O & «ars 2 tet” or 
= 
2 ¢ 


a wee 


~—_ ats 








204 PSYCHOMETRIKA 


— 2|M? - d(M’)? + d(M’)? - M’] — [(M - d(M°))? + (d(M’) - M)’| 
— 2M - d(M’)-M—([M-(M x M*)+ (M x M*)- M] 

— 2[M - (MX M? x M”) + (M X M Xx M”’) - M] 

— [M- (M’ X M? X M’) + (M’ X M’ Xx M’) - My] 

— 2[M -(S x M*) + (S X M*)-M] —-M-(M xX M’)-M 
—2M -(SX M’)-M-4[M-S+S-M] 

— [M - d(M? - d(M’) - M) + d(M - d(M’) - M’) - M] 

— [M - d(M - d(M’) - M’) + d(M” - d(M’) - M) - M) 

— [M - d(M - d(M*) - M) + d(M - d(M’) - M) - M] 

— M - d(M - d(M’) - M) - M — [M’ - d(M - d(M”) - M) 

+ d(M - d(M’) - M) - M’] — 4[M - (d(M’) x d(M)) 

+ (d(M’) x d(M*)) - M] — [d(M’) - M - d(M’) 

+ d(M*) - M - d(M’)] — d(M’) - M? - d(M’) — 4M x [M’- S 

+ S-M’] — Mx M* — 2M X M’ X M®*’ — 2M Xx M” x M 
— M? X M’ X M” — 2M’ X M’ X M* -8SX[M-S+S-M] 
— 2S x M* — 128 x M? — 8S x M”’ — 48°. (6) 


6. Illustrations 
For the sake of simplicity we consider a sociometric matrix M in which 
each of six persons chooses three other persons in this group: 
0 1 1 
;o0 14 
001 1 
(ei eits 
tr ieiits€¢ 
10 11010 


0 
0 


M 














(Thus person 2 communicates with persons 1, 4, and 5, etc.) For this matrix, 
we find R; ,R; , and R;. 

By equation (3), R; = M - d(M*) + d(M’) - M — S. The matrix 
M? = M - M is obtained by ordinary matrix multiplication. 








IAN C. ROSS AND FRANK HARARY 205 





























310220 300000] 

221220 seabliee 
M* = sh clialeaiiadas , whence d(M/’) = 001 noes | 
130221 sininiiinis, 

211230 000030 

220221 i000001) 

Thus 

020230 030330] 

300230 srisbobest' 
on Raat gary. um =/]019011E 
1301030 202020/| 
320200 330300) 
1021030 (011010) 














To find S = M X M’, we form the elementwise product of M and M’, 
getting 


|o 1011 0f 
10001 0] 
g-]9 000041 
10001 0} 
110100] 
001000 











Substituting the last three matrices into formula (3), we obtain 


040 45 0] 
400440 
p,=-|9 3904 1 
403040 
540400 
032040 




















206 PSYCHOMETRIKA 


The matrix P, of pure, i.e., non-redundant, s-step paths, may be obtained 
from the formula 


P, = (M' — R,) — d(M* — R,). (7) 


In formula (7), M* — R, is the matrix obtained by subtracting the matrix 
of redundant s-step paths from the matrix of all s-step paths. However the 
diagonal of M’, although consisting entirely of redundant paths from 7 to 7, 
need not be equal to the diagonal of R, ; for the consistency tables for parti- 
tions were developed for 7 to j paths with 7 # j. Therefore, the term 
d(M* — R,) is subtracted in order to assure that d(P,) will be zero. 

For the special case s = 3, we get 


P, = (M’ — R,) — d(M’ — R,). 


But since R; = M-d(M’) + d(M’)-M — S, it follows at once that d(R;) = 0. 
Hence P; = M* — R; — d(M*), from which 





o22 22 1 

2e@223 1 
p _|6 2063 0 
| 

3 40030 

J1 22201 

l6é 2163 0] 





We illustrate the (2, 5) entry of P; . The totality of all paths from 2 to 5 is 
2-4-3-5, 2-1-4-5, 2-4-1-5, 2-1-2-5, 2-5-1-5, 2-5-2-5, and 2-5-4-5. Since 
there are seven of these paths, of which the last four are redundant, the 
(2,5) entries of M* and R; are 7 and 4, respectively. 

Using formula (4) and working out the details in a similar fashion as 
for the case s = 3 above, we get: 





119 12 5 18 170]] 19 15 7 18 20 2 
17 10 5 18 18 0| 18 16 7 18 20 2 

| 0 16 2) 3 167 18 

r= ||" 14 1 10 16 | But sinee art = |] 18 367 18 20 2] 
| 13 15 6 10 18 3) 17 17 6 18 203 
117 13 5 18 19 O| I8 15 7 18 212 











| 11 13 2 10 15 1 18 16 6 18 203] 








IAN C. ROSS AND FRANK HARARY 207 


we apply formula (7) with s = 4 to get 














03 20 38 2 
10202 2 
— 7208 4 0 
42900 2 0 
12200 2 
73 48 50 
Proceeding similarly for the case s = 5, we get from formulas (5) and (7), 


55 46 20 54 59 5 0200 2 2 
53 48 20 54 60 5 10001 2 
— 51 46 16 50 59 7 a. 3 10 4 2 0 
53 45 21 56 60 6 2100410 
53 47 20 54 67 5 tioeds 
49 45 17 50 58 2 5 2443 0 


























Finally P, = 0, since except for diagonal entries R, = M° by the reason- 
ing of the last sentence of Section 2.1. On substituting this particular matrix 
M into formula (6), we obtained R, . The matrix R, thus obtained was 
identical with M° except on the main diagonal. This provided an empirical 
check of formula (6). 


7. Remarks on the General Case 
Given any communication matrix M, any positive integer s, and enough 
time, one can find R, by the following process: 
(a) write all the g = C(s,2) — 1 partitions of s: 
Jus Fay sa Fas 


: . 8 1 
(b) write all of the u = o([t4 
(c) determine all the general paths corresponding to the consistent col- 
lections of partitions, 
(d) deduce the matrices of these general paths, and 
(e) substitute these matrices into formula (2): 


R, = RY — RY + RY — --> + (—1) "Re. 


It would appear desirable to have a recursion formula for R, , ie., a 
relationship expressing R, in terms of those R, for which ¢ < s. Although 


| 2) consistency tables, 











208 PSYCHOMETRIKA 


several partial results in this direction have been obtained, some of which 
may be fruitful, we shall not include these attempts here. 


REFERENCES 

1. Festinger, Leon. The analysis of sociograms using matrix algebra. Human Relations, 
1949, 2, 153-158. 

2. Katz, Leo. An application of matrix algebra to the study of human relations within 
organizations. Institute of Statistics, University of North Carolina, Mimeograph 
Series, 1950. 

3. Luce, R. D., and Perry, A.D. A method of matrix analysis of group structure. Psycho- 
metrika, 1949, 14, 95-116. 

4. Weiss, Marie J. Higher algebra for the undergraduate. Wiley, 1949, pp. 90-144. 


Manuscript received 9/6/51 


Revised manuscript received 11/6/51 











PSYCHOMETRIKA—VOL. 17, NO. 2 
JUNE, 1952 


MULTIPLE GROUP METHODS FOR COMMON-FACTOR ANALYSIS: 
THEIR BASIS, COMPUTATION, AND INTERPRETATION 


Louis GUTTMAN 
THE ISRAEL INSTITUTE OF APPLIED SOCIAL RESEARCH 


In a previous paper (1) were developed three basic theorems which were 
shown to provide numerical routines, as well as algebraic proof, for existing 
common-factor methods. New “multiple” routines were also indicated. 
The first theorem showed how to extract as many common factors as one 
wished from the correlation matrix in one operation. The second theorem 
showed how to do the same from the score matrix. The third proved that 
factoring the correlation matrix was equivalent to factoring the score matrix. 
A particular application of these theorems is the multiple group factoring 
method, which the writer first used in practice on some Army attitude scores 
during World War II. The present paper explains the basic theorems in 
more detail with special reference to group factoring. Computations are out- 
lined as consisting of five simple matric operations. The meaning of common- 
factor analysis is given in terms of the basic theorems, as well as the relation- 
ship to “inverted” faetor theory. 


Introduction 


Recently, Thurstone (11) has called attention to the similarity between 
a method of Holzinger (6) for extracting several common factors simul- 
taneously and one proposed later by himself (10), namely that of multiple 
groups. Surprisingly, neither of these investigators has taken advantage of 
the complete analysis of multiple group methods that was published earlier 
in this same journal (1). Proof was given there that the numerical procedures 
always succeed in extracting common factors—provided the problem of 
communalities is solved*—and a simple check on the work was provided. 
Since these and other features do not appear in Thurstone’s and Holzinger’s 
papers, it seems worth while to review the results of the earliest of the three 

*Since this article was written, the writer has developed a more general approach to 
factor analysis than that of common factors, which will tentatively be called the “image” 
approach (4). Some of the resulting theorems show that the problem of communalities is a 
most fundamental one which in general cannot be solved by current “approximation” 


procedures; in many cases it admits of no parsimonious solution at all. In one of the 
simplest image structures, the image space is only two-dimensional, yet the minimum 


possible dimensions of any common-factor space is n — 2, where n is the number of observed 

variables. By the “image” of an observed variable is meant its projection on all the 

remaining n — 1 variables, and the image space is the space of all n projections. In general, 
N . 


if there is a parsimonious common-factor solution possible, it will coincide with the image 
solution as n becomes infinite. But image solutions can be parsimonious even when there 
is no parsimonious common-factor space at all. The image approach, then, is more funda- 
mental than the common-factor approach, and does not depend on solving for communali- 
ties at all; to the contrary, it clarifies the communality problem and shows how the latter 
may have no parsimonious solution, even though a parsimonious structure exists. 


209 








210 PSYCHOMETRIKA 


treatments, in order to spell out in more detail just what was accomplished 
there. Additional interpretation will be given of the meaning of the method, 
so that the researcher will more judiciously be able to capitalize on the 
possible variations at his disposal for the numerical computations. 

A simple outline of the calculations will be presented in the form of 
ordinary matrix operaticns. The statistical meaning of each matrix formed 
will be given. 

Factoring may be done either of the correlation matrix or of the score 
matrix. The original paper (1) develops the theorems for both cases, as will 
be elaborated on here. We shall further show the relationship between the 
two factoring processes. This will incidentally provide an analysis of the 
relationship between direct and “inverted” factor analysis. 

Finally, we shall see how the meaning of common factors is derivable 
from the group method. 


2. The Basic Theorem 


In the basic paper (1) the introduction explained that “This paper 
develops a variety of new methods for extracting factors from matrices. It 
gives methods for extracting one factor at a time, and methods for extracting 
several factors at a time, be they oblique or orthogonal.”’ The multiple group 
method is a straightforward application of Theorem 3 of that paper (1, p. 12), 
which will be restated here as Theorem A: 


Theorem A: Let G be a Gramian matrix of order n X n and of rank 

r > 0. Let X be of order s XK n and such that XGX’ is non-singular. Then the 
residual matrix 

G, = G — GX'(XGX') 'XG (1) 


és of rank yr — s and is Gramian. 


It was shown how this theorem includes all previous correlation factoring 
methods, and in addition provides new techniques that enable as many 
common factors to be extracted in one operation as one wishes. Different 
methods differ solely by their choice of the weight matrix X. 


3. Interpretation of the Matrices 


Let us first interpret the matrices in equation (1) for common-factor 
analysis. Later we shall summarize the practical computing steps. 

The matrix G to be factored is the correlation matrix of the n observed 
tests or variables, with communalities in the principal diagonal. 

The number of common factors to be extracted in one operation can 
be selected arbitrarily by the worker, provided it does not exceed r, the total 
number. The number chosen to be extracted is denoted by s. In particular, 
one can choose s to equal r (if r is known in advance), in which case all 
common factors will be extracted in one operation. 








LOUIS GUTTMAN yA 


The chosen number s determines the number of rows for the weight 
matrix X. The number of columns is n, the number of observed variables. 
It is now up to the worker to decide upon the s X n elements of X, which 
can be chosen almost arbitrarily. It is desirable, however, to try to choose 
them a priori so as to avoid or reduce the problem of rotation of axes* after- 
wards, and in particular to have the common factors named before one 
begins. This is not essential from a purely algebraic point of view, but is 
most helpful for psychological meaning. There is of course a great practical 
saving in arithmetic if a simple structure, say, is obtained immediately from 
X without need for a rotation of axes. 

We shall discuss the choice of the elements of X in more detail later, 
but go on to interpret equation (1) for any X, no matter how chosen, and 
no matter what structure results therefrom. 

Let F5 denote the s X n product XG: 


Fi = XG. (2) 


Then F% is the matrix of the covariances of the s common factors with the n 
tests. It is convenient to define the transpose of Fo by (2), and Fp itself is 
of course the transpose of F¢ . 

At this stage, the common-factor scores implied are not necessarily in 
standard form; their means are zero, to be sure, but their standard deviations 
are not unity in general (cf. 1, p. 8). Hence covariances result in equation (2), 
rather than correlation coefficients. The covariances can be converted later 
into correlation coefficients. 

Let L, denote the s X s matrix XGX’: 


Ly) = XGX’. (3) 


L, is the matrix of covariances between the s common factors being ex- 
tracted. In particular, the principal diagonal elements of Ly are the variances 
of the corresponding common factors. 

The only restriction our theorem lays on X is that LZ, be nonsingular. 
Of course, Lo = F(X’, which is the way to compute Zp in practice. 

The s common factors will be mutually orthogonal if and only if the 
non-diagonal elements of Zo are all zero. Then Ly is simply a diagonal matrix 
of variances. Non-zero elements outside the diagonal imply non-zero corre- 
lations between the corresponding common factors. 

F, and LZ, define the common-factor structure of the s factors extracted. 
These two matrices contain all the covariances involved. The covariances 
can be converted into correlation coefficients by dividing through by the 
proper standard deviations as obtainable from the principal diagonal of Lo . 


*In the writer’s new theory of order-factors (4), of which the simplex and the circum- 
plex are specific examples, the problem of rotation need not arise at all. In particular, the 
circumplex pattern turns out to be a more precise algebraic expression for what Thurstone 
calls ‘‘simple structure.” 








212 PSYCHOMETRIKA 


This will be the procedure indicated below when we review the computing 
steps. 
Since Ly is nonsingular, it has an inverse: 


l= {ten 7", (4) 


which is needed to help determine the extent to which the s common factors 
reproduce the original correlation matrix G (with communalities in the main 
diagonal). Let the contribution of the s common factors to G be denoted 
by G, . Then 

Go = F,L,'F 5, (5) 


which is the familiar formula for reproducing correlations from oblique 
factors.* 
The residual matrix of our theorem, namely G, , is the difference be- 


tween the observed and the reproduced correlations: 
G, = G — Gp. (6) 


Equation (6) is simply equation (1) in more compact notation, as can be 
verified from definitions (2), (3), and (5). 

Our theorem states that subtracting out Go actually reduces the rank 
of G by s, so that the residual matrix G, is exactly of rank r — s. The theorem 
states further that G, is a Gramian matrix, not merely a symmetric matrix. 
Indeed, from another portion of the basic paper (1, p. 8), it can be seen that 
G, is a correlation matrix in turn, containing the correlation coefficients 
between the n tests, the s common factors being held constant, and with 
the residual communalities in the main diagonal. 

If s is chosen to equal r, then r — s vanishes, and G, is of rank zero. 
That means that G, is a zero matrix, and G, = G, or Gy perfectly reproduces 
G. In this case, all r common factors have been extracted in one operation. 

If sis less than r, then G, does not vanish. Since it is a correlation matrix, 
it can be factored in turn by the selfsame theorem. This time, a new weight 
matrix must be chosen in place of previous XY, for it is always true of the old 
X that 

XG, = 0. (7) 


.quation (7) implies that YG,X’ is singular (in fact vanishes), which con- 
tradicts a basic condition for further factoring. A new weight matrix, say 
X, , must be chosen so as to make X,G,X{ nonsingular; and then the factoring 


theorem can be applied again for extracting additional common factors. 
Kquation (7) is to be used in practice as a check on the arithmetic. 
*Fach matrix on the right is in terms of covariances, to be sure, but the total product 

is actually independent of change in standard deviation, because of the inverted matrix in 

the middje. Hence both members of (5) are actual (partial) correlation matrices, not 


merely covariance matrices. 











LOUIS GUTTMAN 213 


This simple check may be added also to Thurstone’s and Holzinger’s in- 
structions. It is actually a generalization of the check for the single centroid 
method, where the sum of residuals in each column is zero. The check for 
the more general method of any X is that the weighted sum of residuals in 
each column is zero. 


4. Choosing the Weight Matrix X 


The writer’s first application of this theorem was during World War II, 
when he extracted four common factors simultaneously from a set of scores 
on fifteen attitude areas* of ‘“morale.’”’ The data were from a survey of a 
cross section of the American Army made by the Research Branch of the 
Information and Education Division, A.S.F. These areas are described in 
one of the recent volumes on the Research Branch’s work (8), but the factor 
analysis study of them is as yet unpublished. 

It was hypothesized that four common factors existed for the data and 
furthermore that the fifteen areas could be divided into four groups such 
that each common factor could (a) be named according to the general content 
of one of the groups and—more important from the arithmetical point of 
view—(b) be estimated fairly well from the areas in that group alone. Our 
general theorem was then employed to extract these four common factors 
simultaneously by means of a simply constituted weight matrix Y. Since 
the elements of Y were chosen according to the a priori grouping of the 
observed variables, we called this—as does Thurstone—a group method of 
factoring. 

The resulting common factors in the morale example turned out to be 
somewhat oblique and virtually a simple structure, so that no rotation of 
axes was necessary afterwards. 

Let us see how the weights in X are chosen for group factoring for such 
an example. 

The number of observed variables is n = 15. The number of hypothesized 
common factors is r = 4, and the number to be extracted in the first operation 
is s = 4. That is, here all the common factors are intended to be extracted 
in one operation, or r = s. The weight matrix must accordingly be chosen 
to be of order 4 X 15: one row for each common factor and one column for 
each observed variable. 

The simplest weights to use are ones and zeros. We hypothesized the 
first three variables to identify one common factor, so we defined the first 
row of X to have unity as its first three elements, and zero for each of the 
last twelve elements: 


ii ioe ££) 0D Oro @ ¢ 0 


*Ieach area was a scale or quasi-scale, so that each was actually unidimensional and 
could be represented by a single score (cf. 9 for the most complete discussion available of 
scale analysis). 








214 PSYCHOMETRIKA 


The next five variables, the fourth through the eighth, were believed to 
identify a second common factor, so the second row of X was defined to 
consist of three zeros, five ones, and seven zeros, in that order: 


i ee ie ee ee es ee 


The third row of X was defined to be: 
0080 0060001118000 8 


or variables 9, 10, and 11 were grouped together to define a third common 
factor. The fourth and last row of X was chosen to be: 


Po OO OO 8Peeewtiiti 


indicating that the last four variables were grouped to locate a fourth com- 


mon factor. 
The weight matrix X used in our computations was then composed of 


the four rows just described, or: 





l1 1100000000000 0 
| 

yal? 00111110000000 @) 
1000000001110000 
Sinks eins deh Wii Wickes wk a 





The factoring process itself was accomplished by straightforward matrix 
multiplication, according to our general Theorem A, as outlined below. 


5. The Computing Procedure 


We shall state the computing procedure in five steps, for any choice of 
X, but referring to the morale example for illustration. 

1. Compute XG, and denote the product by Fj . For group factoring, 
this simply means adding the rows of G for each group separately, to obtain 
partial sums. F/ is of order s X n (4 X 15 for our example) and contains 
the covariances between the respective observed variables and common 
factors. The element in row j and column k of F/ is the covariance between 
the jth common factor and the kth observed variable. 

2. Compute FX’ and denote the product by LZ, . For group factoring, 
this simply means adding the columns of F} for each group separately. Lo is 
of order s X s (4 X 4 in our example) and consists of the covariances between 
the s(=4) common factors being extracted. 

3. Let D denote a diagonal matrix with the same principal diagonal as 
Lo , or the diagonal matrix of the variances of the s common factors. Com- 
pute D}, the diagonal matrix of standard deviations. Then D™ consists of 
the reciprocals of the standard deviations. 








LOUIS GUTTMAN 215 


4. Transform the covariances of L, into correlation coefficients by dividing 
by the standard deviations of D', and denote the correlation matrix between 
the common factors by L: 


L = D'L,D". 


As a check on the work, the principal diagonal of Z should contain only 
ones, or the self-correlations of the common factors. 

5. Similarly, transform the covariances of F, into correlation coefficients 
by dividing by the standard deviations in D*, and call the resulting correla- 
tion matrix F: 


F = F,D*. 


Then F is the n X s matrix of test correlations with the common factors. 

To check how well the common factors reproduce the original correlation 
matrix G, compute the reproduced correlation matrix G° by the following 
steps. 

6. Compute L™’. 

7. Compute FL~’, and then (FL~*)F’. Denote this last product by G ; 
here are the reproduced correlations, with reproduced communalities in the 
diagonal. 

8. Compute the residual matrix G, = G — G, . If the elements of G, 
are all very small, then the factoring process is over. Otherwise, G, can be 
factored in turn to yield additional common factors. 

The reader can verify that Thurstone’s example in (8) follows this 
routine, if one expresses the groupings there in terms of a weight matrix X 
of ones and zeros. 

If a simple structure is sought but not found in step 5, then it is con- 
venient to orthogonalize the common factors as a preliminary to rotation of 
axes. This has been suggested as routine procedure by Thurstone (10). A 
drawback is that if rotation is actually necessary, this implies that original 
hypotheses about the nature of the common factors are wrong. A posteriori 
hypotheses, made after inspection of the data, may be subject to all the 
uncertainties and controversies which beset any a posteriori theory. The 
results of a factor analysis would seem to be more trustworthy if the weight 
matrix X is chosen according to an a priori psychological theory, which is 
then tested by the data. This kind of procedure carries more scientific weight 
than the procedure of constructing an a posteriori theory to fit already known 
facts. The latter is implied by rotation of axes (whether or not the rotation 
is “blind’’). 


6. More About the Choice of X 


The simplest a priori theory is that of essentially non-overlapping groups, 
namely that each observed variable (or test) can be put into one and only 











216 PSYCHOMETRIKA 


one group for the purpose of defining the common factors. The simplest 
weights in this case are then ones and zeros, according as the observed variable 
does or does not belong in a designated group. Within a group, the weights 
need not be restricted to unity, but indeed ideally should be the regression 
weights of the hypothesized common factor on the variables involved.* 
Fortunately, when the number of variables in a group exceeds three or four, 
then differential weights will ordinarily make little difference, so that uniform 
weights of unity are often adequate. 

Similarly, the uniform weights of zero given to variables oulstde a group 
are not necessarily strictly correct. Again what are required are true re- 
gression weights for estimating the hypothesized common factor. Giving 
weights of zero in practice implies hypothesizing relatively small regression 
weights, and is often an adequate procedure. 

Strictly speaking, then, the notion of groups is not an all-or-none propo- 
sition, even though weights of unity and zeros are used for convenience. 
These dichotomous weights merely represent a dividing of the true weights 
into two groups—relatively large regression weights and relatively small 
ones—and are but approximations to the best weights. Since the best weights 
are usually difficult to hypothesize a priori, it is fortunate that the ap- 
proximate weights can usually be sufficient. 

In some cases it may be felt a priori that a given observed variable 
should be given weight for more than one common factor. This can be done 





by modifying X accordingly, giving this variable two or more non-zero 
entries in its column in X. The matric equations still hold, for they do not 
presume that non-overlapping groups are used. The groups may overlap as 
much as one pleases. 

In other cases within a group it may be felt that a particular variable 
should be given a larger weight than the others. This may be done by writing 
the larger weight in X in the appropriate cell, and following the matric 
multiplications through with this kind of X. 

One may use negative weights if a regression weight of a common factor 
on a variable should be negative. This does not change the matric equations, 


which are to be followed through as usual. 


7. Factoring in Two or More Operations 


If s is chosen to be less than r, then more than one application of the 
theorem is required to extract all the common factors. The residual matrix 
G, , resulting from weight matrix XY, will not vanish. A new weight matrix 
X, must be applied in turn to G, to extract some or all of the r — s remaining 


common factors. 
The s common factors resulting from the original Y may be oblique to 


*The regression implied is only within the common-factor space, namely on total 


observed scores minus unique-factor scores. 














LOUIS GUTTMAN 217 


each other, and the remaining r — s common factors also may be mutually 
oblique. However, it is a necessary consequence of the factoring process that 
the s common factors are all orthogonal to the remaining r — s common factors. 

Similarly, if X, is used to extract part of the r — s remaining common 
factors, then those extracted by X, are automatically orthogonal to those 
that still remain (as well as to those previously extracted by X). 

In other words, common factors extracted in separate stages are auto- 
matically orthogonal between stages, whether or not they are orthogonal 
within a stage. 

If the underlying structure sought is believed to be entirely oblique, 
then all » common factors must be extracted in one operation if a rotation 
is to be avoided. Using two or more operations will automatically introduce 
orthogonality. 

The orthogonality property between stages described here generalizes 
that of the simple centroid method. There, one common factor is extracted 
at a time, so that the resulting common factors are all mutually orthogonal, 
since each comes from a distinct operation. 


8. The General Theorem for Factoring Scores 

All the prevalent computing procedures concentrate on factoring the 
correlation matrix—so much so that there is serious danger of overlooking 
the basic meaning and problem of factor analysis, which are concerned with 
the original scores. Factoring correlation coefficients is but an indirect—and 
often economical—method of factoring scores, and derives its meaning only 
from the scores involved. It is the observed scores that are to be divided into 
common- and unique-factor scores. Ultimately, one wants to determine these 
common- and unique-factor scores for each individual and not just compute 





over-all test loadings on the factors. 

Holzinger has suggested a method of factoring the scores directly (7) 
which was also anticipated in (1). The latter paper gave a full treatment 
and proof for extracting as many common factors as one wishes at a time 
directly from the scores. 

The basic theorem proved in (1), which indicates the computations re- 
quired, will be repeated here as Theorem B: 

Theorem B. Let S be any matrix of order n X N and of rank r > 0. 
Let X and Y be of orders s X n ands XN, respectively (where s < r), and 
such that XSY’ is nonsingular. Then the residual matrix 

S, = S — SY"(XSY’)'XS (9) 
is exactly of rank r — s. 

In factoring scores, two arbitrary weight matrices are at the computer’s 

disposal: XY and Y. For the theory of common factors, S is the matrix of 








218 PSYCHOMETRIKA 


observed scores on 7 tests or variables for N individuals with unique-factor 
scores subtracted out. That is, the factoring is to be done only within the 
common-factor space. This is equivalent to factoring the correlation matrix 
with communalities in the main diagonal instead of the total variance. 

We immediately see an advantage to factoring the correlation coeffi- 
cients, rather than the score matrix. In the former case, only communalities 
need to be estimated a priori, whereas in the latter case each individual’s 
unique score on each test should be known. 

One of the virtues of the group method of factoring is that, if weights 
of zero and unity are used, and if a large number of variables is in each 
group, then it makes little difference whether the communalities are esti- 
mated closely, for they become but a small fraction in the summations 
involved. If one wishes, unity can be used throughout in the diagonal without 
appreciably affecting the results. (The same property can hold with weights 
more general than zero and unity in X; it can sometimes fail when negative 
weights are used, as is implied by reflection of axes in the centroid method, 
for example). 

The writer has a proof that a similar property holds for the factoring 
of scores. Here the crucial matrix is X rather than Y. In equation (9) of the 
theorem above, X can be defined again as consisting of zeros and ones, 
delimiting groups of variables as in (8). It can be shown that if there is a 
large enough number of variables in each of the s groups, then it is almost 
immaterial whether the fofal score matrix is used for S, rather than the S 
reduced by having unique-factor scores subtracted out. The reduced matrix 
is the one which will be reproduced, not the total scores, even if a non- 
reduced S is used. Correspondingly, the reduced S yields the reduced corre- 
lation matrix (with communalities in the main diagonal). 

The proof of this last proposition will be included in one of a series of 
papers in preparation on a new approach to the analysis of the structure of 
data. In particular, the writer’s theory of nodal analysis, to be published, 
contains common-factor analysis as a special case, and provides new theorems 
and computing procedures for it. The new methods parallel the group- 
factoring methods of this paper, but make a different emphasis. 

9. Direct and Inverted Factor Theories 

Let us now investigate the meaning of the terms of equation (9). Each 
of the rows of S belengs to an observed variable or test (with unique- 
factor scores subtracted out), and each of the N columns belongs to an 
individual or respondent from the population. 

The arbitrary matrix X defines a weighting (or in particular, a growping) 
of the tests, and the arbitrary matrix Y defines a weighting (or a grouping) 
of the individuals. 

The symmetry in (9) between the roles of X and Y suggests the algebraic 























LOUIS GUTTMAN 219 


duality of factor analysis: People can be regarded as factored as well as tests. 
Equation (9) formally implies what some have called inverted or individual 
factor analysis, as well as ordinary test factoring. A caution, of course, is that 
the formal algebra takes cognizance of neither statistical nor psychological 
meaning. Proponents of inverted analysis still have to solve the problem of 
metric along individuals, and the meaning of “common factor” and “unique 
factor” in this case. 

It should be observed that the choice of Y in no way affects the common- 
factor space of tests as studied by X. Any Y (for which X SY’ is nonsingular) 
will yield the same results as any other as far as the test common-factor space 
goes. Therefore, one need not worry about the meaning of inverted analysis 
in using our theorem as far as ordinary common-factor theory is concerned. 
(Similarly, inverted theory need not worry about the meaning of ordinary 
common-factor theory; different choices of X will not affect the results from 
a given Y.) 


10. The Relation of Score Factoring to Correlation Factoring 


Ultimately, the meaning of factor theory rests on the factoring of scores. 
Factoring a correlation matrix is of no value if it does not imply that thereby 
the underlying scores are being correspondingly factored. Fortunately, as the 
writer has shown in the same basic paper being reviewed here (1), this implica- 
tion is always true. One can always be safe in assuming that if a correlation 
matrix is factored in a given way, then corresponding score factor matrices can 
be computed. This is summarized in another theorem proved in (1). The 
theorem is stated as for orthogonal common factors, but can easily be extended 
to oblique factors as well. 


Theorem C. If 
_ 
R= 8S = FF’, (10) 


where S is of ordern X N and of rank r, and F is of order n X r (and of rank r), 
then it ts possible to determine a P of order r X N and such that 


S = FP. (11) 
Furthermore, this P is uniquely determined, and it satisfies 


1 
ae en 
ghP = I. (12) 
The matrices involved have the following interpretation. S is the re- 
duced observed score matrix (observed minus unique-factor scores), the 
observed scores having zero means and unit variances. 
R is the reduced correlation matrix (communalities in main diagonal). 








220 PSYCHOMETRIKA 


F is a matrix of orthogonal common-factor loadings, obtained by any method 
whatsoever from R. 

P is the set of r common-factor scores for each of the N individuals. 

The theorem states that if common-factor loadings F are computed 
directly from R, then this uniquely defines common-factor scores P. 

More generally, the same can be proved for oblique factors. If 

1 Yo ay Lene 
R= 5 SS = FL 'F’, (13) 

then again there exists one and only one common-factor score matrix P such 
that 


S = FL"'P, (14) 
and 
=, PP’ = J. (15) 


Theorem C assumes the common-factor intercorrelations /. to be zero 
outside the main diagonal, so that L = J, which is an unnecessary restriction 
which is easily removable. If the common factors are not orthogonal, then 
we may orthogonalize them arbitrarily, apply Theorem C, and then convert 
back to the original oblique factors. This extends the theorem to oblique 
factors, as indicated by equations (13), (14), and (15). 


11. The Meaning of Common Factors 


Again, let S be the reduced observed score matrix. In Theorem B, let 

s = r, so that all common factors are extracted at once, and then 

S = SY'(X8Y") "XS. (16) 
Equation (16) is actually equation (14) in different notation. From this 
equivalence, we learn a basic interpretation of common factors which is often 
overlooked. 

Equation (14) states that the n reduced observed test scores are weighted 
sums of the r common-factor scores. It is equally important to realize that, in 
turn, the common-factor scores are nothing but weighted sums of the reduced 
test scores. This is implied in equation (16), as will now be indicated, the 
weights being those provided by X. 

Indeed, let 

P= XS, (17) 
and let 
F = SY'(X8Y’)". (18) 

The computation of P according to (17) is a first step in the factoring 
computations according to Theorem B, whether or not s = r. Equation (17) 
states that the scores P on the common factors are weighted sums of the 








LOUIS GUTTMAN 221 


scores S, the weights being taken from X. This indicates precisely how X 
determines what common factors are being extracted, whether by Theorem 
A, for correlations, or Theorem B for scores. 

To see how this ties in with the factoring of correlations, let us compute 
F by using (17). F contains the correlations between the common factors and 
the tests. The covariances required are given by 


Fi = = PS’. (19) 
These are only covariances and not correlations, for according to (17), even 
though each test in S is in standard-score form, the weighted sum P cannot 
be so in general. The means of the factor scores in P are zero, but in general 
the standard deviations are not unity. 

To obtain the right number of (19) from (17), post-multiply both mem- 
bers of (17) by S’ and divide by N: 


N 


But 1/N SS’ is the reduced correlation matrix G of Theorem A, so that 
from (19) and (20), 


l py = lL xgsr 
y PS’ = 7 X88’. (20) 


Fo = XG. (21) 


This is precisely equation (2) which we used in the first step of factoring 
the correlation matrix. 

Similarly, all other matrices can be interpreted from (17). For example, 
let us verify equation (5). The covariance matrix of the common factors is 
L, . This is obtainable from (17) by post-multiplying both members by P’ 
and dividing by N: 


1 , ae eA 
Ly = 7 PP = vy xSP’. (22) 
A comparison of the last number of (22) with (19) shows that 
Lo = XF, = XGX’, (23) 


which is precisely what was stated in equation (3). 

Therefore, everything that was accomplished in Part I of this paper in 
factoring (reduced) correlation matrices by means of a given X according to 
Theorem A, implies using exactly the same X for factoring (reduced) scores 
by means of Theorem B. The common-factor scores are the same in either 
case, if the same X is used (regardless of what Y is used in Theorem B), and 
mean nothing more than the weighted combinations of (reduced) observed 
scores determined according to the weights given by X. 

This result is carried even further by another theorem developed else- 
where by the writer (2). While equation (17) assumes that S is the reduced 








222 PSYCHOMETRIKA 


score matrix, this assumption can be dispensed with if an infinite number 
of tests is involved. The writer has shown how, with an infinite number of 
tests, the common factors are perfect linear functions of the total observed 
scores, so that unique factors need not be subtracted out. The common 
factors are weighted sums of the original test scores as is. 

This is an algebraic meaning of common factors which must be recognized 
in any psychological interpretation. 


REFERENCES 


1. Guttman, Louis. General theory and methods for matrix factoring. Psychometrika, 
1944, 9, 1-16. 

2. Guttman, Louis. Multiple rectilinear prediction and the resolution into components. 
Psychometrika, 1940, 5, 75-99. 

3. Guttman, Louis. The theory of nodal analysis. (In preparation). 

4. Guttman, Louis. A re-analysis of factor analysis. (In preparation). 

5. Guttman, Louis. A new approach to factor analysis (to appear in Paul F. Lazarsfeld, 
Ed., Mathematical thinking in the social sciences, Columbia Univ. Press). 

6. Holzinger, Karl J. A simple method of factor-analysis. Psychometrika, 1944, 9, 
257-262. 

7. Holzinger, Kar] J. Factoring test scores and implications for the method of averages. 
Psychometrika, 1944, 9, 155-167. 

8. Stouffer, et al. The American Soldier, Vol. I of Studies in Social Psychology in World 
War II, Princeton: Princeton Univ. Press, 1949. 

9. Stouffer, et al. Measurement and Prediction, Vol. IV of Studies in Social Psychology 
in World War II, Princeton: Princeton Univ. Press, 1950. 

10. Thurstone, L. L. A multiple group method of factoring the correlation matrix. 
Psychometrika, 1945, 10, 73-78. 

11. Thurstone, L. L. Note about the multiple group method. Psychometrika, 1949, 14, 
43-45. 


Manuscript received 9/14/50 


Revised manuscript received 9/5/51 

















PSYCHOMETRIKA—VOL. 17, No. 2 
JUNE, 1952 


A TECHNIQUE FOR FACILITATING THE ROTATION OF FACTOR 
AXES, BASED ON AN EQUIVALENCE BETWEEN PERSONS 
AND TESTS 


JOSEPH SANDLER 
TAVISTOCK CLINIC, LONDON 


A technique is outlined which may facilitate the rotation of factor axes to 
a meaningful position. It is based on certain relationships between the re- 
sults of test and person factor analysis, and consists essentially of supple- 
menting the test factor space with tests which are the test-equivalents of 
persons or groups of persons. These persons may be, for instance, well-known 
“types” in the domain being investigated, or even “freaks.’””’ The ways in 
which these persons may be selected and used to determine the final rotated 
position of the factor axes is discussed. 


The Equivalence of Persons and Tests* 


Consider the scores of N persons in n tests. These scores may con- 
veniently be written in a matrix S in “normalized” form, so that SS’ = R, 
the matrix of correlations between the tests. s;; is the score of person 7 in 
test 7. 

We may write S = FP, where F is the matrix of orthogonal test factors, 
and P is the population matrix. We have PP’ = J, an identity matrix. It 
will be necessary to consider for the moment the principal-component solution 
F,. Then S = F,P, and FF, = D, a diagonal matrix. 

Instead of calculating R, persons may be analysed by computing the 
N X N matrix Q, where Q = S’S. This is a matrix of person product-sums, 
where any person product-sum q;, is a measure of the resemblance between 
the patterns of test scores made by persons 7 and h. 

We may write 
Q = (F.P.)'(F Ps) 
= P/F'F.P, 
= P!DP, 
= P!D'D'P, 
(D'P,)'(D'P,). 


*This section is based on an extension of Sir Cyril Burt’s Reciprocity Principle 
(Burt, C. The Factors of Mind, 1940.), which in its original form applied only to the special 
case of a doubly-centred score matrix. The Reciprocity Principle was later extended to 
the singly-centred score matrix as used here (Sandler, J. Brit. J. Psychol., Stat. Sect., 1949, 
2, 180-87.). 


223 











224 PSYCHOMETRIKA 

Let D'P, = G. Then it is evident that Q may be factored into a person 
factor matrix G, and it is clear that G is the principal component solution, for 
(D'P,)(D'P,)’ 

D'P,P)D' 


GG’ 


= D, a diagonal matrix. 


If G is normalized by premultiplication by D™?, we get the matrix P, 
which is at once both the population matrix derived from the factor analysis 
of tests and a normalized person factor matrix. F and P may be regarded 
as the results of orthogonal transformations of F, and P, , and it is evident 
that the specification equation S = FP has meaning both in terms of test 
and person factor analysis. For tests, F is the test factor matrix and P the 
matrix of scores of the N persons in the reference tests or factors. For persons, 
P is a normalized person factor matrix, and F is a “population” matrix where 
the tests constitute the population. More simply, F is the matrix of scores 
of the reference persons (person factors) in the n tests. 

A specification equation in terms of person factors is as meaningful 
psychologically as one in terms of test factors. It seems valid to consider a 
person’s score in any given test as a function both of the extent to which he 
is a combination of certain reference persons, and the scores which these 
reference persons obtain in the test. 

Any vector in the space P represents both a test and a person. In the 
one case it is a linear combination of the reference axes considered as fests, 
while in the second case it is exactly the same combination of the reference 
axes, except that these axes are now reference persons. Similarly, any vector 
in space F is both a test and a person, for F is both the test factor matrix 
and a “test-population” matrix, where the factors represent reference persons. 

If a person is located in space F, the test which has the same set of 
direction cosines may be called the ¢est-equivalent of that person. The test- 
equivalent is, in fact, a pure measure of the particular combination of factors 
possessed by this person, who will do well in those tests which are highly 
correlated with his test-equivalent, and badly (i.e., below average) in those 
tests which are negatively correlated with it. The score of person p in any 
test 7 may easily be shown to be proportional to the correlation between 
his test-equivalent ¢ and test 7. For, if we write the score of p in any test 
factor m as 2,,, , and the factor loading of test 7 for factor m as c;, , we have 


Sip = :% Cim& mp 
a a Cin Lin 


= b9;., 


where k is a constant for the person p. 














JOSEPH SANDLER 225 


If, instead of a test 7, we consider the factor m, we have 
Sap = Inp = Kim 
= Kim, 


from which it is clear that the co-ordinates of a point which defines the vector 
tin the test factor space F are given simply by the factor scores of person p. 
Given the factor matrix F, derived from the analysis of the correlation 
matrix R, and the scores of person p in the 7 tests written in the same units 
as the original scores in S (i.e., with the same means and standard deviations 
for the tests), the factor scores of p are given by 


a, = (F’F)"'F’s,, 


where x, is the column vector of factor scores of person p, and s, is the column 
vector of scores of person p in the n tests. By normalizing x, we obtain the 
direction cosines of the test-equivalent ¢ of person p in the F-space. 


The Practical Use of Test-equivalents 


The test-equivalent of any person can be introduced into the test factor- 
space F, the only major time-consuming operation being the calculation of 
(F’F)~" if the number of factors is large. 

The addition of test-equivalents to the factor space has a number of 
applications to the practical problem of determining the position to which 
the factor axes should be rotated. Some of these applications are given below. 

1. If the psychological nature of the tests in the battery is to be taken 
into account in determining the final rotated position of the factor axes, then 
the addition of the test-equivalents of a selected group of persons will increase 
our knowledge of the psychological meaning of the factors. Thus in the 
factor analysis of intelligence tests, people with exceptional talents, or people 
who are highly gifted in some particular way, if introduced into the test 
factor space, may throw a great deal of light on the nature of the factors 
present. Similarly, persons with some special cognitive defect may also be 
of value. 

2. The technique may be used as a test of the adequacy of the psycho- 
logical interpretation of a set of rotated factors. The test-equivalents of 
persons who show the characteristics attributed to a factor may be added; 
and, if the psychological interpretation of the factors is correct, these vectors 
should cluster close to the factor in the factor space. If, for instance, the 
hypothesis is made that a memory factor is present in a battery of intelligence 
tests, then people with exceptionally good or weak memories (as assessed by 
criteria other than the tests) may be introduced into the factor space as a 
check on the validity of the hypothesis. 

3. In the search for simple structure, the effort involved in factoring a 
large number of tests is great. Yet, on the whole, the larger the number of 











226 PSYCHOMETRIKA 


points in the factor space, the more nearly unique is the simple structure 
likely to be. The number of points in the factor space may be substantially 
increased by the introduction of a carefully selected set of persons or groups 
of persons into the factor space. 

4. Thurstone has urged the use of ‘freaks’ in factor analysis. These 
may be “much more revealing of the underlying nature of the domain than 
are carefully segglomized samples fromthe general population.”* People who 
show exceptional deviation irom ee in any one direction may, if 
their test-equivalents are plotted in the factor space, contribute much to the 
identification of the factors present. This approach may have special value 
in the analysis of physical measurements. 

5. In the domain of temperamental traits, the claim has often been 
made that certain factors are present in normal subjects, and that these 
factors are noticeable in their extreme form in certain mental illnesses. It 
should be possible to test claims such as these by the introduction of the 
test-equivalents of typical psychotics or groups of psychotics into a factor 
space derived from the analysis of the temperamental characteristics of 
normal individuals. If the test-equivalents of psychotic patients can be used 
to account for a large part of the variance of the tests due to the common 
factors in the normal population, then we have here a possible method for 
uniquely determining the rotated position of the factor axes in this domain. 


An Example 


In order to illustrate the use of test-equivalents of persons as a guide 
to rotation of the factor axes, we may refer to Thurstone’s factor analysis 
of boxes.t Thurstone intercorrelated a set of 26 measurements of boxes 
which involved three positively correlated parameters (xz, y, and z). Three 
factors were extracted from the correlation matrix R by the group centroid 
method. These factors were then rotated to an oblique structure, and the 
rotated factors corresponded in fact with the three box dimensions. 

In the present example, the factor loadings for the last 23 tests, derived 
from the group centroid analysis, will be taken as constituting a matrix of 
orthogonal factors Fy (Table 1). The omitted measurements are in fact 
x, y, and z. It will be shown that if appropriate boxes (persons) are chosen 
and introduced into the test factor space in the form of their test-equivalents, 
then they provide a basis for a rotation of the factor axes. In the present 
case, the transformation matrix A is’ nearly identical with that found by 
Thurstone. 

The three parameters x, y, and z, may be taken to correspond to height, 
length, and breadth, respectively. Then an investigator experienced in the 
domain of boxes may be able to say, on the basis of his experience, that there 


*Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947, xii. 
tibid., Ch. XVI. 











JOSEPH SANDLER 227 


are three main types of boxes, the tall, the long, and the broad, and he may 
be able to choose typical examples of each. It may also happen that there 
exist “pathologically” tall, long, or broad boxes, outside the normal range 
of box sizes. The matrix S, shows the scores of three such boxes in tests 4 
to 26 (Table 2). These scores, which would usually be obtained by direct 
measurement, were derived in the present instance by arbitrarily giving the 
tall box a score of 10 in x, the long box a score of 7 in y, and the broad box 
a score of 3 in z. The scores of these three boxes in the 23 tests used were 
taken as proportional to the correlations (listed by Thurstone) between z, 
y, and z on the one hand, and tests 4 to 26 on the other. 

Table 3 gives the matrix product FiF, , and Table 4 gives (FjF,)~'. 
The matrix X = (FjF,)"'F’S, is given in Table 5, and contains the co- 
ordinates of the test-equivalents of the three selected persons in the test 
factor space F, . Normalizing the columns of X, we have the transformation 
matrix A (Table 6), which may be used to obtain a new, rotated, position 
of the factor axes. 


TABLE 1 
The Matrix Fo of Orthogonal Box Factors 











Test I II Ill 
4 SOF — .04 48 
5 .88 — .40 —.24 
6 .89 41 —.20 
7 84 — .35 43 
8 .86 .22 43 
9 83 — 00 — .03 
10 85 — .26 —.44 
1l 86 49 —.01 
12 87 .29 — .38 
13 — 07 — .98 — .09 
14 07 .98 09 
15 — .05 — .55 80 
16 05 .55 — .80 
17 00 49 85 
18 00 — .49 — .85 
19 86 .05 48 
20 87 — .39 — .32 
21 .90 .40 —.19 
22 85 .05 47 
23 . 86 | — .32 
24 .89 .39 —.16 
25 .99 —.01 01 











228 PSYCHOMETRIKA 


TABLE 2 


The Scores of Three Selected Boxes in the Tests (box measurements) 
Written in the Matrix S, 














Test Tall Long Broad 
4 7.50 5.39 1.14 
5 7.60 2.59 2.40 
6 2.60 5.53 2.46 
7 9.00 4.13 aag 
8 5.50 6.44 1.20 
9 8.90 2.31 1.89 

10 6.10 2.45 2.70 
11 2.40 6.23 2.04 
12 2.60 4.69 2.73 
13 5.90 —4.20 —0.15 
14 —5.90 4.20 0.15 
15 5.90 —0.21 —1.74 
16 —5.90 0.21 1.74 
17 —0.60 3.92 —1.56 
18 0.60 —3.92 1.56 
19 6.70 5.74 1.08 
20 7.20 2.45 2.43 
21 2.80 5.60 2.46 
22 6.60 5.74 a1 
23 6.90 2.38 2.43 
24 2.90 5.60 2.37 
25 6.60 §.1] 2.22 
26 5.60 5.25 2.22 
TABLE 3 


The Matrix FjFo 











I II III 
I 13. 1537 . 3087 — .1101 
II .3087 4.7958 . 2247 


III —.1101 . 2247 4.4963 

















JOSEPH SANDLER 229 


TABLE 4 
The Matrix (F/Fo)— 














I II III 
I .07616 — .00500 .00212 
II — .00500 . 20933 — .01058 
III .00212 — .01058 . 22298 
TABLE 5 


The Matrix X = (FjFo)“1F'S, 























Tall Long Broad 
I 655.86 518.63 227.45 
II — 663 .94 361.70 16.04 
III 308.37 247 .99 —186.99 
TABLE 6 
The Transformation Matrix A 
Tall Long Broad 
I .6673 . 7636 .77138 
II — .6755 .5325 .0544 
III .3137 .3651 — .6341 





Manuscript received 4/23/51 


Revised manuscript received 8/6/51 























PSYCHOMETRIKA—VOL. 17, No. 2 
JUNE, 1952 


IBM COMPUTATION OF SUMS OF PRODUCTS FOR POSITIVE AND 
NEGATIVE NUMBERS 


Pau. J. BurRKE 
WORLD BOOK COMPANY 


The wiring of the plugboard of the IBM type 405 machine for the com- 
putation of sums of squares and cross products of positive and negative 
numbers is described. The method makes greater demands on the X distribu- 
tor and class selector capacity of the machine than does the method of wiring 
the plugboard when the numbers all have the same sign. 


The use of the IBM tabulator for obtaining statistics necessary for the 
computation of correlation coefficients has long been a routine matter. Sums 
of squares and of cross products are obtained by the method of progressive 
digiting or by some variation of it. When the series from which the statistics 
are to be calculated contain both positive and negative numbers, however, 
the method of wiring the machine is less obvious than when the numbers all 
have the same sign; and coding is often used to avoid negative numbers. 

Coding to avoid negative numbers may be unnecessary when its purpose 
is to facilitate the use of the IBM tabulator for obtaining sums of products. 
A method for wiring the type 405 machine for this problem is described 
below. Although essentially this method is used in some laboratories, it is 
not available, to the writer’s knowledge, in the psychological or educational 
literature. Therefore, the writer has thought it worth while to describe the 
method in detail. 

The fundamental idea of the method is to route the ‘‘wire to C”’ impulse 
to cause a particular counter to add or subtract according to the combination 
of “+” and “‘—” in the control column and variable being added in the 
counter. The tabulation sheet or deck of summary cards obtained is 
exactly like those ordinarily obtained with positive numbers only, with the 
one slight exception that 2x appears in a separate counter position and is 
given only for the control column on each tabulation. a’ and Zzy are 
obtained in exactly the same way as with positive numbers only. 

The wiring of the plugboard, in addition to the usual wiring, is as follows :* 
(It will be assumed that negative scores are X- or Y- punched in each column, 

*Essentially the same wiring, except for the use of two-position X distributors, is 


briefly described by H. R. J. Grosch in “Harmonic Analysis by the Use of Progressive 
Digiting,” Proceedings of the Educational Research Forum, 1946, p. 82. 


231 








232 PSYCHOMETRIKA 


not just in the units position. Sorting is exactly the same as for positive 
numbers only.) 


1. Wire the upper brush of one position of each variable to the X pick-up 
hub of an X distributor. 

2. Wire the upper brush control column to the bus hubs. 

3. Wire from the bus hubs to 

(a) the X pick-up hub of a class selector and 
(b) a counter entry and thence from the corresponding counter list 
exit to the comparing relays. 

4, Wire the lower-brush of the control column for group indication to the 
comparing relays via a counter-list filter, and also to a counter entry. This 
counter will give the algebraic sum of the control column or =z. 

5. (a) ‘Wire to C” is wired to the C hubs of the X distributors controlled 
by the variables (and also to such counters as are being used for 
special purposes.) ‘‘Wire to C” also goes to one C hub of the 
control-column class selector and from the corresponding X and 
NX hubs to the “‘“—” and “+” impulse hubs of the counter giving 
22. 

(b) Split-wire each X hub of the X distributors controlled by the vari- 
ables to an X and NX hub in different positions of the control- 
column class selector, and split-wire also from each NX hub of 


From Control Col. From Plug to “C” From Var. I 
“ oN From Var. IX 


S S:0°8| 


O2 





























FIGURE 1 

















PAUL J. BURKE 233 


these X distributors to the corresponding NX and X hubs of the 
class selector. 
(c) Wire from the C hubs of the class selector to the “+” and “—” 
hubs of the counters in the obvious way. 
6. Other wiring, for printing, card count, subtract units position control, 
etc., is as usual. 
A wiring diagram of the selector network for a problem involving two 
variables is shown in Figure 1. 
A test deck with the distributions shown in Table 1 was punched and the 
resulting tabulation sheet is shown in Table 2. 








TABLE 1 
Distributions 
Col. 2 Col. 4 
X Y 
-—3 -—3 
—2 1 
2 2 
2 1 
2 0 2X =2 
i 1 ZY = -1 
-1 —2 
1 -1 ZXY = 16 
1 0 2X? = 30 
-1 -1 
0 i LY? = 23 
TABLE 2 
Tabulation 
f xXx x. > 
Col. 2 1 3 3 
2 4 1l 5 
1 5 16 8 
1 16 9 
16 9 2 
Col. 4 1 3 3 
2 2 6 ff 
1 6 7 13 
3 10 13 99 
10 13 


Manuscript received 4/25/51 
Revised manuscript received 11/11/51 


SS:22 SE 222 E522 


er - me Se 





Bsaeswse ts 
ani = 














PSYCHOMETRIC MONOGRAPHS 


Two new issues of the Psychometric Monograph Series are in press. 
No. 6 of this series will be Dimensions of Functional Psychosis by James W. 
Degan, $1.50, which is scheduled for release in November. No. 7 will be 
A Theory of Test Scores by Frederic Lord, $2.00, which is scheduled for 
release in the early fall. 


Previously published issues of the Psychometric Monograph Series are: 


Thurstone, L. L. Primary mental abilities. i" 
Psychometric Monograph No. 1, $2.00. i 
ie 

Thurstone, L. L. and Thurstone, Thelma Gwinn, Factorial studies of ‘at 
intelligence. Me 
Psychometric Monograph No. 2, $1.59 a 
; wot 
Wolfie, Dael. Factor analysis to 1940. wth 
Psychometric Monograph No. 3, $1.25. iN} 
Thurstone, L. L. A factorial study of perception. tf 
Psychometric Monograph No. 4, $2.50. a 


French, John W. The description of aptitude and achievement tests in 
terms of rotated factors. i 
Psychometric Monograph No. 5, $4.00. “a 


Orders for Nos. 1 through 6 until December 31, 1952 should be sent to: i 
The University of Chicago Press 
5750 Ellis Avenue r 
Chicago, Illinois . ise 


The September issue of Psychometrika will announce where to send all 


orders for No. 7 and orders for Nos. 1 through 6 after December 31, 1952. 


235 














THE INTER-AMERICAN SOCIETY OF PSYCHOLOGY 


During the International Congress of Mental Health, held in Mexico 
City in December, 1951, the INTER-AMERICAN SOCIETY OF PSY- 
CHOLOGY was formed. The following officers were elected: President: 
Eduardo Krapf, University of Buenos Aires, Argentina; Vice-president: 
Werner Wolff, Bard College, Annandale-on-Hudson, New York; Secretary: 
Oswaldo Robles, University of Mexico; Treasurer: Hernan Vergara, Univer- 
sity of Bogota, Colombia; Associated Vice-presidents: W. Line, Canada; 
Enrique B. Roxo, Brazil; Carlos Nassar, Chile; Jaime Barrios Pena, Guate- 
mala. The society has its Latin-American office at the University of Mexico 
and its U. 8. A. office at Bard College. 

It is the purpose of the society to work toward Inter-American coopera- 
tion and mutual understanding by means of psychological collaboration on 
basic scientific, educational and socio-psychological issues. Among the specific 
aims are to organize an interchange of students and teachers; to found a 
bi-lingual journal on topical issues and opinion exchange; and to establish a 
film library on basic psychological issues. 

The society will hold annual meetings, the first of which will be in Caracas, 
Venezuela, in December 1952. 

For an exchange of psychological studies an inter-American library will 
be established in the offices of Mexico.and the U. 8. A. It would be greatly 
appreciated if authors would send copies of their works to the Secretary’s 
office in Mexico: Dr. Oswaldo Robles, Facultad de Filosofia y Letras de la 
Universidad de Mexico, San Cosme 71, Mexico D. F. 

The annual membership fee is $5.00. All funds shall be used to finance 
the Congress, to establish an Inter-American Journal of Psychology, and to 
found the film library. 

Applications for membership of American psychologists, accompanied 
by a curriculum vitae, should be sent in triplicate to the Vice-president, Dr. 
Werner Wolff, Bard College, Annandale-on-Hudson, New York. 


236 











PREPARATION OF PROBLEM AND SOURCE MATERIALS FOR THE 
MATHEMATICAL TRAINING OF SOCIAL SCIENTISTS 


As readers of this journal probably know, a Committee on the Mathe- 
matical Training of Social Scientists has been at work for some time. The 
Committee includes representatives from the following associations and 
societies: American Anthropological Association, American Economics Asso- 
ciation, American Educational Research Association, American Farm Eco- 
nomics Association, American Political Science Association, American 
Psychological Association, American Sociological Society, American Statis- 
tical Association, Econometric Society, Institute of Mathematical Statistics, 
Mathematical Association of America, and Psychometric Society. 

As the result of a suggestion from this Committee, the Social Science 
Research Council is now sponsoring a small group to work during the summer 
of 1952. This group will attempt to compile from the literature of the various 
social sciences lists of problems, extracts from sources, and references to 
sources that illustrate varieties of uses of mathematics in the social sciences. 
These compilations are expected to serve a number of important ends—e.g., 
to provide mathematicians with material for use in texts and courses designed 
for social scientists, to indicate the general dimensions of the mathematical 
training appropriate for students of the social sciences now and in the future, 
and to facilitate the study of mathematics by social scientists for whom 
organized courses are not available. 

This Committee believes that the group referred to would find it most 
helpful if it could have a wide variety of suggestions from the various areas 
concerned. A general appeal for such suggestions is hereby made. They 
should be sent to Professor William G. Madow, Chairman, Committee on the 
Mathematical Training of Social Scientists, Baker Library, Hanover, New 
Hampshire up to August 15; and thereafter University of Illinois, Urbana, 
Illinois. 

Although the Committee does not wish to limit the suggestions to specific 
types of material, it would prefer greater emphasis on materials relating to the 
use of mathematics in the social sciences themselves than on those relating to 
statistics, since the materials necessary for statistics are better known. More- 
over, the Committee would suggest that those who respond not concern them- 
selves with questions of duplication of what others would say, but give as 
much information as possible. This first request for assistance is aimed at 
providing those who are interested in this subject with an opportunity to make 
their views known to the Committee in as general terms as they wish. 

Finally, the Committee would appreciate learning where programs of 
mathematical training intended for social scientists are now in existence or in 
process of development, and where mathematics at the level of the calculus 


237 





238 PSYCHOMETRIKA 


or higher is required for undergraduate or graduate degrees in the social 
sciences or may be substituted for another requirement for a degree in a social 


science. 


Editor’s Note: It is regretted that, because of publication delay of this © 
issue, the foregoing notice did not come to the attention of the readers at a 
time when it would have been most useful. It seems probable that Professor 
Madow still would like to receive suggestions, however. 

















